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Preface 


It was our privilege to serve as the program chairs for CAV 2019, the 31st International 
Conference on Computer-Aided Verification. CAV 2019 was held in New York, USA, 
during July 15-18, 2019. The tutorial day was on July 14, 2019, and the pre-conference 
workshops were held during July 13-14, 2019. All events took place in The New 
School in New York City. 

CAV is an annual conference dedicated to the advancement of the theory and 
practice of computer-aided formal analysis methods for hardware and software sys- 
tems. The primary focus of CAV is to extend the frontiers of verification techniques by 
expanding to new domains such as security, quantum computing, and machine 
learning. This put CAV at the cutting edge of formal methods research, and this year’s 
program is a reflection of this commitment. 

CAV 2019 received a very high number of submissions (258). We accepted 13 tool 
papers, two case studies, and 52 regular papers, which amounts to an acceptance rate of 
roughly 26%. The accepted papers cover a wide spectrum of topics, from theoretical 
results to applications of formal methods. These papers apply or extend formal methods 
to a wide range of domains such as concurrency, learning, and industrially deployed 
systems. The program featured invited talks by Dawn Song (UC Berkeley), Swarat 
Chaudhuri (Rice University), and Ken McMillan (Microsoft Research) as well as 
invited tutorials by Emina Torlak (University of Washington) and Ranjit Jhala (UC San 
Diego). Furthermore, we continued the tradition of Logic Lounge, a series of discus- 
sions on computer science topics targeting a general audience. 

In addition to the main conference, CAV 2019 hosted the following workshops: The 
Best of Model Checking (BeMC) in honor of Orna Grumberg, Design and Analysis of 
Robust Systems (DARS), Verification Mentoring Workshop (VMW), Numerical 
Software Verification (NSV), Verified Software: Theories, Tools, and Experiments 
(VSTTE), Democratizing Software Verification, Formal Methods for ML-Enabled 
Autonomous Systems (FoMLAS), and Synthesis (SYNT). 

Organizing a top conference like CAV requires a great deal of effort from the 
community. The Program Committee for CAV 2019 consisted of 79 members, a 
committee of this size ensures that each member has to review a reasonable number of 
papers in the allotted time. In all, the committee members wrote over 770 reviews while 
investing significant effort to maintain and ensure the high quality of the conference 
program. We are grateful to the CAV 2019 Program Committee for their outstanding 
efforts in evaluating the submissions and making sure that each paper got a fair chance. 

Like last year’s CAV, we made artifact evaluation mandatory for tool submissions 
and optional but encouraged for the rest of the accepted papers. The Artifact Evaluation 
Committee consisted of 27 reviewers who put in significant effort to evaluate each 
artifact. The goal of this process was to provide constructive feedback to tool devel- 
opers and help make the research published in CAV more reproducible. The Artifact 
Evaluation Committee was generally quite impressed by the quality of the artifacts, 
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and, in fact, all accepted tools passed the artifact evaluation. Among regular papers, 
65% of the authors submitted an artifact, and 76% of these artifacts passed the eval- 
uation. We are also very grateful to the Artifact Evaluation Committee for their hard 
work and dedication in evaluating the submitted artifacts. 

CAV 2019 would not have been possible without the tremendous help we received 
from several individuals, and we would like to thank everyone who helped make CAV 
2019 a success. First, we would like to thank Yu Feng and Ruben Martins for chairing 
the Artifact Evaluation Committee and Zvonimir Rakamaric for maintaining the CAV 
website and social media presence. We also thank Oksana Tkachuk for chairing the 
workshop organization process, Peter O’Hearn for managing sponsorship, and Thomas 
Wies for arranging student fellowships. We also thank Loris D’Antoni, Rayna 
Dimitrova, Cezara Dragoi, and Anthony W. Lin for organizing the Verification 
Mentoring Workshop and working closely with us. Last but not least, we would like to 
thank Kostas Ferles, Navid Yaghmazadeh, and members of the CAV Steering 
Committee (Ken McMillan, Aarti Gupta, Orna Grumberg, and Daniel Kroening) for 
helping us with several important aspects of organizing CAV 2019. 

We hope that you will find the proceedings of CAV 2019 scientifically interesting 
and thought-provoking! 
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Abstract. Mission-time LTL (MLTL) is a bounded variant of MTL over naturals 
designed to generically specify requirements for mission-based system operation 
common to aircraft, spacecraft, vehicles, and robots. Despite the utility of MLTL 
as a specification logic, major gaps remain in analyzing MLTL, e.g., for specifi- 
cation debugging or model checking, centering on the absence of any complete 
MLTL satisfiability checker. We prove that the MLTL satisfiability checking prob- 
lem is NEXPTIME-complete and that satisfiability checking MLTLo, the variant 
of MLTL where all intervals start at 0, is PSPACE-complete. We introduce trans- 
lations for MLTL-to-LTL, MLTL-to-LTL¢, MLTL-to-SMV, and MLTL-to-SMT, 
creating four options for MLTL satisfiability checking. Our extensive experimen- 
tal evaluation shows that the MLTL-to-SMT transition with the Z3 SMT solver 
offers the most scalable performance. 


1 Introduction 


Mission-time LTL (MLTL) [34] has the syntax of Linear Temporal Logic with the option 
of integer bounds on the temporal operators. It was created as a generalization of the vari- 
ations [3, 14,25] on finitely-bounded linear temporal logic, ideal for specification of mis- 
sions carried out by aircraft, spacecraft, rovers, and other vehicular or robotic systems. 
MLTL provides the readability of LTL [32], while assuming, when a different duration is 
not specified, that all requirements must be upheld during the (a priori known) length of 
a given mission, such as during the half-hour battery life of an Unmanned Aerial System 
(UAS). Using integer bounds instead of real-number or real-time bounds leads to more 
generic specifications that are adaptable to model checking at different levels of abstrac- 
tion, or runtime monitoring on different platforms (e.g., in software vs in hardware). 
Integer bounds should be read as generic time units, referring to the basic temporal res- 
olution of the system, which can generically be resolved to units such as clock ticks or 
seconds depending on the mission. Integer bounds also allow generic specification with 
respect to different granularities of time, e.g., to allow easy updates to model-checking 
models, and re-usable specifications for the same requirements on different embedded 
systems that may have different resource limits for storing runtime monitors. MLTL has 
been used in many industrial case studies [18, 28, 34, 37,4244], and was the official logic 
of the 2018 Runtime Verification Benchmark Competition [1]. Many specifications from 
other case studies, in logics such as MTL [3] and STL [25], can be represented in MLTL. 
We intuitively relate MLTL to LTL and MTL-over-naturals as follows: (1) MLTL formulas 
are LTL formulas with bounded intervals over temporal operators, and interpreted over 


© The Author(s) 2019 
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 3-22, 2019. 
https://doi.org/10.1007/978-3-030-25543-5_1 


4 J. Liet al. 


finite traces. (2) MLTL formulas are MTL-over-naturals formulas without any unbounded 
intervals, and interpreted over finite traces. 

Despite the practical utility of MLTL, no model checker currently accepts this logic 
as a specification language. The model checker nuXmv encodes a related logic for 
use in symbolic model checking, where the O and © operators of an LTLSPEC can 
have integer bounds [21], though bounds cannot be placed on the U or V (the Release 
operator of NUXMv) operators. 

We also critically need an MLTL satisfiability checker to enable specification debug- 
ging. Specification is a major bottleneck to the formal verification of mission-based, 
especially autonomous, systems [35], with a key part of the problem being the avail- 
ability of good tools for specification debugging. Satisfiability checking is an integral 
tool for specification debugging: [38,39] argued that for every requirement y we need to 
check y and ~y for satisfiability; we also need to check the conjunction of all require- 
ments to ensure that they can all be true of the same system at the same time. Spec- 
ification debugging is essential to model checking [39-41] because a positive answer 
may not mean there is no bug and a negative answer may not mean there is a bug 
if the specification is valid/unsatisfiable, respectively. Specification debugging is criti- 
cal for synthesis and runtime verification (RV) since in these cases there is no model; 
synthesis and RV are both entirely dependent on the specification. For synthesis, sat- 
isfiability checking is the best-available specification-debugging technique, since other 
techniques, such as vacuity checking (cf. [6,10]) reference a model in addition to the 
specification. While there are artifacts one can use in RV, specification debugging is 
still limited outside of satisfiability checking yet central to correct analysis. A false pos- 
itive due to RV of an incorrect specification can have disastrous consequences, such as 
triggering an abort of an (otherwise successful) mission to Mars. Arguably, the biggest 
challenge to creating an RV algorithm or tool is the dearth of benchmarks for check- 
ing correctness or comparatively analyzing these [36], where a benchmark consists of 
some runtime trace, a temporal logic formula reasoning about that trace, and some ver- 
dict designating whether the trace at a given time satisfies the requirement formula. A 
MLTL satisfiability solver is useful for RV benchmark generation [22]. 

Despite the critical need for an MLTL satisfiability solver, no such tool currently 
exists. To the best of our knowledge, there is only one available solver (zot [8]) for check- 
ing the satisfiability of MTL-over-naturals formulas, interpreted over infinite traces. 
Since MLTL formulas are interpreted over finite traces and there is no trivial reduction 
from one to another, zot cannot be directly applied to MLTL satisfiability checking. 

Our approach is inspired by satisfiability-checking algorithms from other logics. 
For LTL satisfiability solving, we observe that there are multiple efficient translations 
from LTL satisfiability to model checking, using nuXmv [40]; we therefore consider 
here translations to nuXmv model checking, both indirectly (as a translation to LTL), 
and directly using the new KLIVE [13] back-end and the BMC back-end, taking advan- 
tage of the bounded nature of MLTL. The bounded nature of MLTL enables us to also 
consider a direct encoding at the word-level, suitable as input to an SMT solver. Our 
contribution is both theoretic and experimental. We first consider the complexity of such 
translations. We prove that the MLTL satisfiability checking problem is NEXPTIME- 
complete and that satisfiability checking MLTLo, the variant of MLTL where all inter- 
vals start at 0, is PSPACE-complete. Secondly, we introduce translation algorithms 
for MLTL-to-LTLy (LTL over finite traces [14]), MLTL-to-LTL, MLTL-to-SMV, and 
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MLTL-to-SMT, thus creating four options for MLTL satisfiability checking. Our results 
show that the MLTL-to-SMT transition with the Z3 SMT solver offers the most scal- 
able performance, though the MLTL-to-SMV translation with an SMV model checker 
can offer the best performance when the intervals in the MLTL formulas are restricted 
to small ranges less than 100. 


2 Preliminaries 


A (closed) interval over naturals 7 = [a,b] (0 < a < b are natural numbers) is a set of 
naturals {7 | a < i < b}. I is called bounded iff b < +00; otherwise I is unbounded. 
MLTL is defined using bounded intervals. Unlike Metric Temporal Logic (MTL) [4], 
it is not necessary to introduce open or half-open intervals over the natural domain, as 
every open or half-open bounded interval is reducible to an equivalent closed bounded 
interval, e.g., (1,2) = 9, (1,3) = [2,2], (1,3] = [2,3], etc. Let AP be a set of atomic 
propositions, then the syntax of a formula in MLTL is 


p := true | false | p| ~y |p AY | yV y | Oy | 9y | eure | pRiv 


where J is a bounded interval, p € AP is an atom, and y and w are subformulas. 

Given two MLTL formulas ¢, Y, we denote y = Y iff they are syntactically equiv- 
alent, and p = w iff they are semantically equivalent, i.e., n = vy iff m — w fora 
finite trace 7. In MLTL semantics, we define false = —true, p V Y = ~(ny A ~), 
a(y Ur p) = (YR) and =O;y = Oray. MLTL keeps the standard operator 
equivalences from LTL, including (zy) = (true Uro), (Ary) = (false Rr p), and 
(o Rr Y) = (A(-¢ Ur 7w)). Notably, MLTL discards the neXt (X) operator, which is 
essential in LTL [32], since Vy is semantically equivalent to Oj 1). 

The semantics of MLTL formulas is interpreted over finite traces bounded by base- 
10 (decimal) intervals. Let 7 be a finite trace in which every position x[i] (i > 0) is 
over 24”, and |7| denotes the length of 7 (|| < +00 when 7 is a finite trace). We 
use 77; (|7| > i > 0) to represent the suffix of 7 starting from position i (including i). 
Let a,b € I, a < b; we define that m models (satisfies) an MLTL formula vy, denoted as 
T =| y, as follows: 


T = piff p € [0]; 

t = 7g iff r Kk g; 

-aTeEypAviff7 — yandr E y; 
T | Y Uja») Y iff |7| > a and, there exists i € [a,b], 1 < |7| such that 7; = w and 

for every j € [a,b], j < i it holds that 7; = y; 


Compared to the traditional MTL-over-naturals! [16], the Until formula in MLTL is 
interpreted in a slightly different way. In MTL-over-naturals, the satisfaction of p Ur w 
requires ọ to hold from position 0 to the position where y holds (in J), while in MLTL 
yy is only required to hold within the interval J, before the time ~ holds. From the 
perspective of writing specifications, cf. [34,37], this adjustment is more user-friendly. 


' In this paper, MTL-over-naturals is interpreted over finite traces. 
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It is not hard to see that MLTL is as expressive as the standard MTL-over-naturals: 
the formula Y Uja) Y in MTL-over-naturals can be represented as (Ojo ,a—1]9) ^ 
(p Ufa, Y) in MLTL; p Uja o) in MLTL can be represented as ja, a} (P Ujo, b-a] Y) in 
MTL-over-naturals. 

We say an MLTL formula is in BNF if the formula contains only =, A and Uz oper- 
ators. It is trivial to see that every MLTL formula can be converted to its (semantically) 
equivalent BNF with a linear cost. Consider y = (~a) V ((4b)Rz(-c)) as an example. 
Its BNF form is (a A (b Ur c)). Without explicit clarification, this paper assumes that 
every MLTL formula is in BNF. 

The closure of an MLTL formula y, denoted as cl(p), is a set of formulas such that: 
(1) y E clp); (2) y € celp) if zy E clle); (3) y, Y E cl(y) if p op Y € cl(y), where 
op can be ^ or Uz. Let |cl(p)| be the size of cl(p). Since the definition of cl (p) ignores 
the intervals in y, |cl(y)| is linear in the number of operators in y. We also define the 
closure(*) of an MLTL formula y, denoted cl* (p), as the set of formulas such that: (1) 
el(y) © el*(p); (2) if p Ufa yy Y E cd (p) for0 < a < b, then y Uja—1,b—1] Y% is in 
cl*(p); (3) if p Uo» Y E cl* (p) for 0 < b, then y Ujo,4-1) Y is in cl* (p). Let |cl*(y)| 
be the size of cl* (p) and K be the maximal natural number in the intervals of ọ. It is 
not hard to see that |cl*(y)| is at most K - |cl(y)]. 

We also consider a fragment of MLTL, namely MLTLo, which is more frequently 
used in practice, cf. [18,34]. Informally speaking, MLTLo formulas are MLTL formulas 
in which all intervals start from 0. For example, ((o,4;a\ (a Ujo,1] b) is a MLTLo formula, 
while (/2,4)@ is not. 

Given an MLTL formula y, the satisfiability problem asks whether there is a finite 
trace 7 such that m = vy holds. To solve this problem, we can reduce it to the satis- 
fiability problem of the related logics LTL and LTL; (LTL over finite traces [14]), and 
leverage the off-the-shelf satisfiability checking solvers for these well-explored logics. 
We abbreviate MLTL, LTL, and LTLy, satisfiability checking as MLTL-SAT, LTL-SAT, 
and LTL-SAT respectively. 


LTLy: Linear Temporal Logic over Finite Traces [14]. We assume readers are famil- 
iar with LTL (over infinite traces). LTLy is a variant of LTL that has the same syntax, 
except that for LTL +, the dual operator of ¥ is M (weak Next), which differs V in the 
last state of the finite trace. In the last state of a finite trace, Vw can never be satisfied, 
while My is satisfiable. Given an LTL; formula y, there is an LTL formula ~ such that 
y is satisfiable iff ~ is satisfiable. In detail, y = OT ail A t(y) where Tail is a new 
atom identifying the end of the satisfying trace and ¢(y) is constructed as follows: 


p) = p where p is an atom; 

ay) = =t(4); 

Xyp) = Tail A Xt(w); 

Yı A Wa) = ti) A twa); 

-= t(Y1U p2) = t(-Tail A Y1)Ut(Y2). 


In the above reduction, y is in BNF. Since the reduction is linear in the size of the 
original LTL; formula and LTL-SAT is PSPACE-complete [45], LTL-SAT is also a 
PSPACE-complete problem [14]. 


— t( 
— t( 
— t( 
— t( 
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3 Complexity of MLTL-SAT 


It is known that the complexity of MITL (Metric Interval Temporal Logic) satisfiabil- 
ity is EXPSPACE-complete, and the satisfiability complexity of the fragment of MITL 
named MITLo,.. is PSPACE-complete [2]. MLTL (resp. MLTLo) can be viewed as a 
variant of MITL (resp. MITLo,..) that is interpreted over the naturals. We show that 
MLTL satisfiability checking is NEXPTIME-complete, via a reduction from MLTL to 
LTL,. 


Lemma 1. Let y be an MLTL formula, and K be the maximal natural appearing in the 
intervals of p (K is set to 1 if there are no intervals in p). There is an LTLy formula 6 
that recognizes the same language as p. Moreover, the size of 0 is in O(K - \cl(y)]). 


Proof (Sketch). For an MLTL formula y, we define the LTL; formula f(y) recursively 
as follows: 


— If y = true, false, or an atom p, f(y) = 9; 
- Ify = =}, f(y) = =f (4); 

- Ify = Eny, flo) = FE A F); 

- If p = £ Uja, o] Y, 


X(F(E Uja- 1,b—1] Y)), if0 <a < b; 
FO) V (F(E) A €(F(EUja,o-1Y))), if a = 0 and 0 < b; 
fD), if a = 0 and b = 0; 


f(y) = 


& represents the neXt operator in LTLy. Let 0 = f(w); we can prove by induction 
that y and @ accept the same language. Moreover, the size of 0 is at most linear to 
K - |cl(y)|, (K - |cl(y)]), based on the aforementioned construction. 


We use the construction shown in Lemma | to explore several useful properties of 
MLTL. For instance, the LTL; formula translated from an MLTL formula contains only 
the XY temporal operator or its dual M, which represents weak Next [19,23], and the 
number of these operators is strictly smaller than K - |cl()|. Every X or N subformula 
in the LTL; formula corresponds to some temporal formula in cl* (p). Notably, because 
the natural-number intervals in ọ are written in base 10 (decimal) notation, the blow-up 
in the translation of Lemma | is exponential. 

The next lower bound is reminiscent of the NEXPTIME-lower bound shown in [31] 
for a fragment of Metric Interval Temporal Logic (MITL), but is different in the details 
of the proof as the two logics are quite different. 


Theorem 1. The complexity of MLTL satisfiability checking is NEXPTIME-complete. 


Proof (Sketch). By Lemma 1, there is an LTL; formula 0 that accepts the same traces 
as MLTL formula y, and the size of 0 is in O(K - |cl(y)|). The only temporal connec- 
tives used in 0 are ¥ and N, since the translation to LTL ¢ reduces all MLTL temporal 
connectives in y to nested ¥’s or N’s (produced by simplifying =). Thus, if 6 is 
satisfiable, then it is satisfiable by a trace whose length is bounded by the length of 0. 
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Thus, we can just guess a trace m of exponential length of 0 and check that it satisfies 
y. As a result, the upper bound for MLTL-SAT is NEXPTIME. 

Before proving the NEXPTIME lower bound, recall the PSPACE-lower bound 
proof in [45] for LTL satisfiability. The proof reduces the acceptance problem for a 
linear-space bounded Turing machine M to LTL satisfiability. Given a Turing machine 
M and an integer k, we construct a formula ym such that ym is satisfiable iff M 
accepts the empty tape using k tape cells. The argument is that we can encode such a 
space-bounded computation of M by a trace 7 of length c* for some constant c, and 
then use yyy to force m to encode an accepting computation of M. The formula Ym 
has to match corresponding points in successive configurations of M, which can be 
expressed using a O(k)-nested %’s, since such points are O(k) points apart. 

To prove a NEXPTIME-lower bound for MLTL, we reduce the acceptance problem 
for exponentially bounded non-deterministic Turing machines to MLTL satisfiability. 
Given a non-deterministic Turing machine M and an integer k, we construct an MLTL 
formula ym of length O(k) such that y m is satisfiable iff M accepts the empty tape in 
time 2". Note that such a computation of a 2"-time bounded Turing machines consists of 
2* many configurations of length 2% each, so the whole computation is of exponential 
length — 4", and can be encoded by a trace 7 of length 4”, where every point of 7 
encodes one cell in the computation of M. Unlike the reduction in [45], in the encoding 
here corresponding points in successive configurations are exponentially far (2%) from 
each other, because each configuration has 2% cells, so the relationship between such 
successive points cannot be expressed in LTL. Because, however, the constants in the 
intervals of MLTL are written in base-10 (decimal) notation, we can write formulas of 
size O(k), e.g., formulas of the form p Uro,2*) q, that relate points that are 2* apart. 

The key is to express the fact that one Turing machine configuration is a proper 
successor of another configuration using a formula of size O(k). In the PSPACE-lower- 
bound proof of [45], LTL formulas of size O(k) relate successive configurations of 
k-space-bounded machines. Here MLTL formulas of size O(k) relate successive con- 
figurations of 2*-time-bounded machines. Thus, we can write a formula Ym of length 
O(k) that forces trace 7 to encode a computation of M of length 2*. 


Now we consider MLTLo formulas, and prove that the complexity of checking the 
satisfiability of MLTLo formulas is PSPACE-complete. We first introduce the following 
lemma to show an inherent feature of MLTLo formulas. 


Lemma 2. The conjunction of identical MLTLo U-rooted formulas is equivalent to the 
conjunct with the smallest interval range: (€ Ujo,a) Y) A (E Uio] Y) = (E Upo,a) Y), 
where b > a. 


Proof. We first prove that for i > 0, the equation (€ Uo, Y) A (E Uz) Y) = 
(€ Uo] Y) holds. When 7 = 0, we have (€ Uto, Y) = f(w) and (€ Uio, Y) = 
(Fb) V FCE) A &(F(Y))). So (E Uoo Y) A (E Uog Y) = FY) = (E Uoo Y) is 


) 
true. Inductively, assume that (€ Ujo,4) Y) A (E Ujo,k+1] Y) = (€ Uio,x] W) is true for 
k > 0. When i = k + 1, we have (€ Uox41) Y) = (F4) VrO A "XE Uos v)) 
and (€ Uo,e42) ¥) = (FO) V FE) A XE Uos Ù). By hypothesis assumption, 
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(E Ufo,ny Y) A (E Uio,k+1] Y) = (E Uio,x) Y) implies that the following equivalence is 
true: 


(E Uio k+] Y) A (E Uo,n+2) Y) 
= (FV) V (FCE) A XE Uio, Y) A C) V (FE) A XE Uio,r+ ¥))) 
= f(W) V (FE) A XE Uo, YA E Uio Y)) 
= f(W) V (FE) A XE Uio, ¥)) 
= (€Up,x+1 %). 
Since (€ Ujo, Y) A (E Uio,i+1] Y) = (€ Uio, Y) is true, we can prove by induction that 


(E Uio, Y) A (E Uo,3] Y) = (€ Ujo, Y) is true, where j > i. Because b > a is true, it 
directly implies that (€ Ujo,q) Y) A (E Ujo,o) Y) = (€ Uio,a] Y) is true. 


Lemma 3. X -free LTL+-SAT is reducible to MLTLo-SAT at a linear cost. 


Proof. According to [45], the satisfiability checking of V-free LTL formulas is still 
PSPACE-complete. This also applies to the satisfiability checking of 4-free LTLy for- 
mulas. Given an -free LTL formula y, we construct the corresponding MLTL formula 
m() recursively as follows: 


— m(p) = p where p is an atom; 

- m(>§) = =m(£); 

— m(E AY) = m(E) Am); 

— m(EU p) = m(£) Up, 21¢1) m(y). 


Notably for the Until LTL; formula, we bound it with the interval [0, 2'*!], where 
y is the original V-free LTL; formula, in the corresponding MLTL formula, which is 
motivated by the fact that every satisfiable LTLy formula has a finite model whose length 
is less than 2!*! [14]. The above translation has linear blow-up, because the integers in 
intervals use the decimal notation. Now we prove by induction over the type of » that 
y is satisfiable iff m(qy) is satisfiable. That is, we prove that (>) m |= y implies 
T = m(y) and (4) t H m(vy) implies r H y, for some finite trace 7. 

We consider the Until formula n = U y (noting that ọ is fixed to the original 
LTL; formula), and the proofs are trivial for other types. (=) 77 is satisfiable implies 
there is a finite trace 7 such that m |= 7 and |x| < 2!*! [14]. Moreover, 7 H n holds 
iff there is 0 < 7 such that 7; = wW and for every 0 < j < i, 7; |} € is true (from 
LTL; semantics). By the induction hypothesis, 7; = w implies 7; | m(w) and m; H £ 
implies 7; H m(E). Also, i < 2!*! is true because of |r| < 2!¥!. As a result,  E 7 
implies that there is 0 < i < 2I*! such that m; m(w) and for every 0 < j < i, 
mj H| m(€) is true. According to the MLTL semantics, 7 = m(n) is true. (<=) m(n) 
is satisfiable implies there is a finite trace 7 such that 7 = m(n). According to MLTL 
semantics, there is 0 < i < 2!*! such that m; H m(w) and for every 0 < j < iit 
holds that 7; = m(£). By hypothesis assumption, m; |= m(%) implies 7; | w and 
Tj  m(E) implies 7; H £. Also, 0 < i < 2!*! implies 0 < i. As a result, m E m(n) 
implies that there is 0 < 7 such that m; = w and for every 0 < j < iit holds that 
a; H é. From LTL; semantics, it is true that 7 = n. 
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Theorem 2. The complexity of checking the satisfiability of MLTLo is PSPACE- 
complete. 


Proof. Since Lemma 3 shows a linear reduction from -free LTL¢-SAT to MLTLo- 
SAT and 4-free LTL ;-SAT is PSPACE-complete [14], it directly implies that the lower 
bound of MLTLo-SAT is PSPACE-hard. 


For the upper bound, recall from the proof of Theorem | that an MLTL formula y is 
translated to an LTL formula @ of length K -|cl(p)|, which, as we commented, involved 
an exponential blow-up in the notation for K. Following the automata-theoretic app- 
roach for satisfiability, one would translate 0 to an NFA and check its non-emptiness 
[14]. Normally, such a translation would involve another exponential blow-up. We show 
that this is not the case for MLTLo. Recalling from the automaton construction in [14] 
that every state of the automaton is a set of subformulas of 0, the size of a state is at 
most K - |cl(y)|. In the general case, if %1, 2 are two subformulas of 8 corresponding 
to the MLTL formulas € Ur, Y and £ Uz, Y, Yı and Y2 can be in the same state of the 
automaton, which implies that the size of the state can be at most K - |cl(y)|. When the 
formula y is restricted to MLTLo, we show that the exponential blow-up can be avoided. 
Lemma 2 shows that either pı or wp. in the state is enough, since assuming J, C Io, 
then (Yı A %2) = v1, by Lemma 2. So the size of the state in the automaton for a 
MLTLo formula ¢ is at most |cl(p)|. For each subformula in the state, there can be K 
possible values (e.g., for 0 in the state, we can have O10,1]§: Òlo, 2J, etc.). Therefore 
the size of the automaton is in O(2/¢()! . KIM) œ QOUlel()I), Therefore, MLTLo 
satisfiability checking is a PSPACE-complete problem. 


4 Implementation of MLTL-SAT 


We first show how to reduce MLTL-SAT to the well-explored LTL¢-SAT and LTL- 
SAT. Then we introduce two new satisfiability-checking strategies based on the inherent 
properties of MLTL formulas, which are able to leverage the state-of-art model-checking 
and SMT-solving techniques. 


4.1 MLTL-SAT via Logic Translation 


For a formula y from one logic, and 7 from another logic, we say y and w are equi- 
satisfiable when ¢ is satisfiable under its semantics iff ọ% is satisfiable under its seman- 
tics. Based on Lemma 1 and Theorem 1, we have the following corollary, 


Corollary 1 (MLTL-SAT to LTL;-SAT). MLTL-SAT can be reduced to LTL’-SAT 


with an exponential blow-up. 


From Corollary 1, MLTL-SAT is reducible to LTL¢-SAT, enabling use of the off- 
the-shelf LTL; satisfiability solvers, cf. aaltaf [23]. It is also straightforward to consider 
MLTL-SAT via LTL-SAT; LTL-SAT has been studied for more than a decade, and there 
many off-the-shelf LTL solvers are available, cf. [24,38, 40]. 
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Theorem 3 (MLTL to LTL). For an MLTL formula ¢, there is an LTL formula 0 such 
that p and 6 are equi-satisfiable, and the size of 0 is in O(K - |cl(y)|), where K is the 
maximal integer in p. 


Proof. Lemma 1 provides a translation from the MLTL formula y to the equivalent 
LTL; formula y’, with a blow-up of O(K - |cl(~)|). As shown in Sect. 2, there is a 
linear translation from the LTL; formula y’ to its equi-satisfiable LTL formula 6 [14]. 
Therefore, the blow-up from ọ to @ is in O(K - |el(y)|). 


Corollary 2 (MLTL-SAT to LTL-SAT). MLTL-SAT can be reduced to LTL-SAT with 


an exponential blow-up. 


Since MLTL-SAT is reducible to LTL-SAT, MLTL-SAT can also benefit from the 
power of LTL satisfiability solvers. Moreover, the reduction from MLTL-SAT to LTL- 
SAT enables leveraging modern model-checking techniques to solve the MLTL-SAT 
problem, due to the fact that LTL-SAT has been shown to be reducible to model check- 
ing with a linear blow-up [38,39]. 


Corollary 3 (MLTL-SAT to LTL-Model-checking). MLTL-SAT can be reduced to 
LTL model checking with an exponential blow-up. 


In our implementation, we choose the model checker nuXmv [12] for LTL sat- 
isfiability checking, as it allows an LTL formula to be directly input as the temporal 
specification together with a universal model as described in [38,39]. 


4.2 Model Generation 


Using the LTL formula as the temporal specification in nUXmv has been shown, how- 
ever, to not be the most efficient way to use model checking for satisfiability checking 
[40]. Consider the MLTL formula Qjo,10)@ A O(1,11)@. The translated LTLy formula is 
f(Ojo,10)) A ¥(f(Oj0,10)@)), Where f(Qjo,10)@) has to be constructed twice. To avoid 
such redundant construction, we follow [40] and encode directly the input MLTL for- 
mula as an SMV model (the input model of nuXmv) rather than treating the LTL for- 
mula, which is obtained from the input MLTL formula, as a specification. 

An SMV [27] model consists of a Boolean transition system Sys = (V,I,T), 
where V is a set of Boolean variables, J is a Boolean formula representing the initial 
states of Sys, and T is the Boolean transition formula. Moreover, a specification to be 
verified against the system is also contained in the SMV model (here we focus on the 
LTL specification). Given the input MLTL formula y, we construct the corresponding 
SMV model Mọ as follows. 


— Introduce a Boolean variable for each atom in ọ as well as for “Tail” (new variable 
identifying the end of a finite trace). 

— Introduce a Boolean variable ¥ _q) for each U formula ~ in cl* (p), which represents 
the intermediate temporal formula Vw. 

— Introduce a temporary Boolean variable? Ty for each U formula in cl* (vy). 


> A temporary variable is introduced in the DEFINE statement rather than the VAR statement of 
the SMV model, as it will be automatically replaced with those in VAR statements. 
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— A Boolean formula e(7) is used to represent the formula 4 in cl*(y) in the SMV 
model, which is defined recursively as follows. 
1. e(w) = V, if v is an Boolean atom; 
2. e(p) = re(y1), if Y = mY; 
3. e() = ef) A e(wa), if Y = y1 A Yo; 
4. e(w) = T4, if y is an U formula. 
— Let the initial Boolean formula of the system Sys be e(y). 
— For each temporary variable T'_y, create a DEFINE statement according to the type 
and interval of Y, as follows. 


X_(WUja—1,b-1]¥2), if0 <a < b; 
Ty Ua sjv2 = $ elwa) V (eli) A ¥ (ai0,0-1)¥2)), if a = 0 and 0 < b; 
e(q2), if a = 0 and b = 0. 
— Create the Boolean formula (V_y > (“Tail A next(e(w)))) for each Xy in the 
VAR list (the set V in Sys) of the SMV model. 


— Finally, designate the LTL formula O—T ail as the temporal specification of the SMV 
model M, (which implies that a counterexample trace satisfies (Tail). 


Encoding Heuristics for MLTLo Formulas. We also encode the rules shown in Lemma 
2 to prune the state space for checking the satisfiability of MLTLo formulas. These rules 
are encoded using the INVAR constraint in the SMV model. Taking the U formula 
as an example, we encode T-(Y1Uio V2) A T-ilo,a-1)2) > T-(il(o,0--1}¥2) 
(a > 0) for each y1U(o,q]2 in cl* (p). Similar encodings also apply to the R formulas 
in cl* (p). Theorem 4 below guarantees the correctness of the translation, and it can be 
proved by induction over the type of y and the construction of the SMV model. 


Theorem 4. The MLTL formula ¢ is satisfiable iff the corresponding SMV model M, 
violates the LTL property D7T ail. 


There are different techniques that can be used for LTL model checking. Based 
on the latest evaluation of LTL satisfiability checking [24], the KLIVE [13] back-end 
implemented in the SMV model checker nuXmv [12] produces the best performance. 
We thus choose KLIVE as our model-checking technique for MLTL-SAT. 


Bounded MLTL-SAT. Although MLTL-SAT is reducible to the satisfiability problem of 
other well-explored logics, with established off-the-shelf satisfiability solvers, a dedi- 
cated solution based on inherent properties of MLTL may be superior. One intuition is, 
since all intervals in MLTL formulas are bounded, the satisfiability of the formula can 
be reduced to Bounded Model Checking (BMC) [9]. 


Theorem 5. Given an MLTL formula ọ with K as the largest natural in the intervals 
of p, vy is satisfiable iff there is a finite trace x with |x| < K -|cl(~)| such thatr = g. 


Theorem 5 states that the satisfiability of a given MLTL formula can be reduced to 
checking for the existence of a satisfying trace. To apply the BMC technique in nuXmv, 
we compute and set the maximal depth of BMC to be the value of K -|cl(w)| for a given 
MLTL formula vy. The input SMV model for BMC is still My, as described in Sect. 4.2. 
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However to ensure correct BMC checking in nuXmv, the constraint “FAIRNESS 
TRUE” has to be added into the SMV model.’ The LTLSPEC remains O-Tail. 
According to Theorem 5, ọ is satisfiable iff the model checker returns a counterexample 
by using the BMC technique within the maximal depth of K - |cl(y)|. 


4.3 MLTL-SAT via SMT Solving 


Another approach to solve MLTL-SAT is via SMT solving, considering that using SMT 
solvers to handle intervals in MLTL formulas is straightforward. Since the input logic of 
SMT solvers is First-Order Logic, we must first translate the MLTL formula to its equi- 
satisfiable formula in First-Order Logic over the natural domain N. We assume that 
readers are familiar with First-Order Logic and only focus on the translation. Given 
an MLTL formula ọ and the alphabet X, we construct the corresponding formula in 
First-Order Logic over N in the following way. 


1. For each p € X, define a corresponding function fp : Int — Bool such that fp(k) 
is true (k € N) iff there is a satisfying (finite) trace m of y and p is in 7[k]. 
2. The First-Order Logic formula fol(y, k, len) for y (k, len € N) is constructed recur- 
sively as below: 
— fol(true, k, len) = (len > k) and fol(false, k, len) = false; 
— fol(p, k, len) = (len > k) A fp(k) for p € X; 
— fol(Aé,k,len) = (len > k) A =fol(€, k, len); 
— fol(€A y, k, len) = (len > k) A fol(€, k, len) A fol(w, k, len); 
— fol(E Uja a V, k, len) = (len > atk)Adi.( (atk <i < b+k)A fol(y, i, len— 
DAV). (atk <j <i) > fol(€, 7, len — 7))); 


In the formula fol(y, k, len), k represents the index of the (finite) trace from which y 
is evaluated, and len indicates the length of the suffix of the trace starting from the index 
k. Since the formula is constructed recursively, we need to introduce k to record the 
index. Meanwhile, len is necessary because the MLTL semantics, which is interpreted 
over finite traces, constrains the lengths of the satisfying traces of the Until formulas. 
The following theorem guarantees that MLTL-SAT is reducible to the satisfiability of 
First-Order Logic. 


Theorem 6. For an MLTL formula ọ, ọ is satisfiable iff the corresponding First-Order 
Logic formula Alen.folp,0, len) is satisfiable. 


Proof. Let the alphabet of y be X, and 7 € (2”)* be a finite trace. For each p € X, 
we define the function fp : Int — Bool as follows: f,(k) = true iff p € a[k] if 
0 < k < |r|. We now prove by induction over the type of p and the construction 
of fol(y, k, len) with respect to y that 7, | holds iff {f,|p € X} is a model of 
fol(y, k, |r|): here |r] is the length of 7. The cases when y is true or false are trivial. 


— If y = pis an atom, Tk = ọ holds iff p € z[k] (i.e., 7,[0]) is true, which means 
fp(k) = true. As a result, {fp} is a model of fol(y,k,|7|), which implies that 
Tr F ¢ holds iff { fp|p € X} is a model of fol(y, k, |z]). 


> Based on comments in emails from the nuXmv developers. 
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— If p = ~E, tm, = vy holds iff 7, j£ € holds. By hypothesis assumption, 7, = € 
holds iff {f,|p € X} is a model of fol(€,k, ||), which is equivalent to saying 
Tk  € holds iff {f,|p € X} is not a model of fol(€, k, |r|). As a result, ry F =E 
holds iff { f,|p € X} is a model of —fol(€, k, |7|). 

-Ifyg=€EAy, Tk H| ¢ holds iff 7, = € and mk | Y. By hypothesis assumption, 
Tr H € (resp. mk H| WY) holds iff {fp|p € X} is a model of fol(£, k, |r|) (resp. 
fol(, k, |7|)). According to the construction of the fol function, {fp|p € X} is a 
model of fol(é A y, k, |r|). As a result, my H € A w holds iff { f,|p E€ X} is a model 
of fol(é A w, k, |r|). 

— If p = E Ulja b] Y, Te = y holds iff there isa + k < i < b+ k such that m; | y and 
Tj - € holds for every a + k < j < i. By hypothesis assumption, 7; = w holds iff 
{fp|p € X} is a model of fol(w, i, len — i) (the length of 7; is len — i), and 7, j H 
holds iff { fp|p € X} is a model of fol(€, j, |r| — j) (the length of 7; is |r| — j). 
Moreover, |z| > a + k must be true according to the MLTL semantics. As a result, 
{fplp € X} is a model of fol(y, k, |r|), which implies that 7, = € Ufa») holds iff 
{fplp € X} is a model of fol( Ufa.) V, k, |7). 


= 


This proof holds for all values of k, including the special case where k = 0. 


We then encode Alen.fol(y, 0, len) into the SMT-LIB v2 format [7], which is the 
input of most modern SMT solvers; we call the full SMT-LIB v2 encoding SMT(v). 
We first use the “declare-fun” command to declare a function fa : Int — Bool for 
each p € X. We also define the function fọ : Int x Int — Bool for the First-Order 
Logic formula fol(y, k, len). The corresponding SMT-LIB v2 command is “define-fun 
fo ((k Int) (len Int)) Bool S(fol(y,k,len))”, where S(fol(y, k,len)) is the SMT- 
LIB v2 implementation of fol(y, k, len). In detail, S(fol(y, k,len)) is acquired recur- 
sively as follows. 


— S(fol(p, k, len)) — (and (> len k) (fp K) 

— S(Hfol(y, k, len)) — (and (> len k) (not S(fol(y, &)))) 

- S(fol(yi AY, k, len) — (and (> len k) (and S(fol(y1, k, len)) S(fol(a, k, len)))) 

— S(fol(p1 Ufa) Y, k, len)) — (and (> len a+k) (exists (i Int) (and (< (+a k)i) 
(> i (+b k)) S(fol(y, i, len — i)) (forall (j Int) (= (and (< (4+ ak) j) (< ji) 
Siolla, j, len — 3))))))) 


Finally, we use the “assert” command “(assert (exists ((len Int)) (fọ 0 len)))” 
together with the “(check-sat)” command to request SMT solvers for the satisfiability of 
dlen.fol(y, 0, len). In a nutshell, the general framework of the SMT-LIB v2 format for 
SMT(v) (i.e., Slen.fol(y, 0, len)) is shown in Table 1, and the correctness is guaranteed 
by Theorem 7 below. 


Table 1. The SMT-LIB v2 template for SMT (y). 


(declare-fun fa (Int) Bool) //declare corresponding function for a € X 


//define function for fol(y, k, len) 

(define-fun fọ ((k Int) (len Int)) Bool S(fol(y, k, len))) 
(assert (exists (len Int)) (f 0))) 

(check-sat) 
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Theorem 7. The First-Order Logic formula Alen. fol(y,0,len) is satisfiable iff the 
SMT solver returns SAT with the input SMT(v). 


An inductive proof for the theorem can be conducted according to the construc- 
tion of SMT (4). Notably, there is no difference between the SMT encoding for MLTL 
formulas and that for MLTLo formulas, as the SMT-based encoding does not require 
unrolling the temporal operators in the formula. 


5 Experimental Evaluations 


Tools and Platform. We implemented the translator MLTLconverter in C++, including 
encodings for an MLTL formula as equi-satisfiable LTL and LTL; formulas, and corre- 
sponding SMV and SMT-LIB v2 models. We leverage the extant LTL solver aalta [24], 
LTL; solver aaltaf [23], SMV model checker nuXmv [12], and the SMT solver Z3 
[29] to check the satisfiability of the input MLTL formula in their respective encodings 
from MLTLconverter. The solvers, including the runtime flags we used, are summa- 
rized in Table 2. We evaluated both BMC and KLIVE [13] model-checking back-ends 
in nuXmv, and the corresponding commands are shown in Fig. 1. Notably in the figure, 
the maximal length “MAX” to run BMC is computed dynamically for each MLTL for- 
mula, based on Theorem 5. 


Table 2. List of solvers and their runtime flags. 


Encoding MLTLconverter flag | Solver Solver flag 

LTL -ltl aalta | default 

LTL; -ltlf aaltaf default 

SMV -SMV nuXmv -source bmc.cmd (BMC) 

-source klive.cmd (KLIVE) 

SMT-LIB v2 | -smtlib Z3 -smt2 

read_model 

flatten_hierarchy read_model 
encode_variables flatten_hierarchy 
build_boolean_model encode_variables 
bmc_setup build_boolean_model 
go_bmc check_Itlspec_klive -d 
check_Itlspec_bmc -k MAX quit 

quit 


Fig. 1. nuXmv commands for BMC (left) and KLIVE (right). 
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All experiments were executed on Rice University’s NOTS cluster,* running Red- 
Hat 5, with 226 dual socket compute blades housed within HPE s6500, HPE Apollo 
2000, and Dell PowerEdge C6400 chassis. All the nodes are interconnected with 10 
GigE network. Each satisfiability check over one MLTL formula and one solver was 
executed with exclusive access to one CPU and 8 GB RAM with a timeout of one hour, 
as measured by the Linux time command. We assigned a time penalty of one hour to 
benchmarks that segmentation fault or timeout. 


Experimental Goals. We evaluate performance along three metrics. (1) Each satisfia- 
bility check has two parts: the encoding time (consumed by MLTLconverter) and the 
solving time (consumed by solvers). We evaluate how each encoding affects the per- 
formance of both stages of MLTL-SAT. (2) We comparatively analyze the performance 
and scalability of end-to-end MLTL-SAT via LTL-SAT, LTL;-SAT, LTL model check- 
ing, and our new SMT-based approach. (3) We evaluate the performance and scalability 
for MLTLo satisfiability checking using MLTLo-SAT encoding heuristics (Lemma 2). 


Benchmarks. There are few MLTL (or even MTL-over-naturals) benchmarks available 
for evaluation. Previous works on MTL-over-naturals [2-4] mainly focus on the theo- 
retic exploration of the logic. To enable rigorous experimental evaluation, we develop 
three types of benchmarks, motivated by the generation of LTL benchmarks [38].° 


(1) Random MLTL Formulas (R): We generated 10,000 R formulas, varying the formula 
length L (20, 40, 60, 80, 100), the number of variables N (1, 2, 3, 4, 5), and the prob- 
ability of the appearance of the U operator P (0.33, 0,5, 0.7, 0.95); for each (L, N, P) 
we generated 100 formulas. For every U operator, we randomly chose an interval [ż, 7] 
where i > 0 and j < 100. 


LTL-SAT == LTL-SAT == 
LTLE-SAT —<— LTLESAT —<— 
= SMV —— BMC —— 
= 400 SMT —— = KLIVE —2— 
E E SMT = 
g o 
= £ 
2 300 b 
3 5 
í 8 
3 200 3 
8 ï 
E z 
z E 
a g 
§ 100 8 
Od -e Shd 0 i 1 1 1 
4000 10000 0 2000 4000 6000 8000 10000 


Number of Formulas Number of Formulas 


Fig. 2. Cactus plot for different MLTL encod- Fig.3. Cactus plot for different MLTL solv- 
ings on R formulas: LTL-SAT and LTL/-SAT ing approaches on R formulas: LTL-SAT and 
lines overlap; SMV and SMT lines overlap. LTL+-SAT lines overlap. 


* https://docs.rice.edu/confluence/display/CD/NOTS+Overview. 
> All experimental materials are at http://temporallogic.org/research/CAV 19/. The plots are best 
viewed online. 
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(2) NASA-Boeing MLTL Formulas (NB): We use challenging benchmarks [15] created 
from projects at NASA [17,26] and Boeing [11]. We extract 63 real-life LTL require- 
ments from the SMV models of the benchmarks, and then randomly generate an interval 
for each temporal operator. (We replace each ¥ with Lj; ,1]-) We create 3 groups of such 
formulas (63 in each) to test the scalability of different approaches, by restricting the 
maximal number of the intervals to be 1,000, 10,000, and 100,000 respectively. 


(3) Random MLTLo Formulas (RO): We generated 500 RO formulas in the same way 
as the R formulas, except that every generated interval was restricted to start from 0; 
we generated sets of five for each (L, N, P). This small set of R benchmarks serve 
to compare the performance on MLTLo formulas whose SMV encodings were created 
with/without heuristics. 


Correctness Checking. We compared the verdicts from all solvers for every test 
instance and found no inconsistencies, excluding segmentation faults. This exercise 
aided with verification of our implementations of the translators, including diagnosing 
the need for including FAIRNESS TRUE in BMC models. 


Experimental Results. Figure 2 compares encoding times for the R benchmark for- 
mulas. We find that (1) Encoding MLTL as either LTL and LTLy is not scalable even 
when the intervals in the formula are small; (2) The cost of MLTL-to-SMV encoding is 
comparable to that from MLTL to SMT-LIB v2. Although the cost of encoding MLTL 
as LTL/LTLy and SMV are in O(K - |cl()|), where K is the maximal interval length 
in y, the practical gap between the LTL/LTL; encodings and SMV encoding affirms 
our conjecture that the SMV model is more compact in general than the corresponding 
LTL/LTL; formulas. Also because K is kept small in the R formulas, the encoding cost 
between SMV and SMT-LIB v2 becomes comparable. 

Figure 3 shows total satisfiability checking times for R benchmarks. Recall that the 
inputs of both BMC and KLIVE approaches are SMV models. The MLTL-SAT via 
KLIVE is the fastest solving strategy for MLTL formulas with interval ranges of less 
than 100. The portion of satisfiable/unsatisfiable formulas of this benchmark is approx- 
imate 4/1. Although BMC is known to be good at detecting counterexamples with short 
lengths, it does not perform as well as the KLIVE and SMT approaches on checking 
satisfiable formulas since only longer counterexamples (with length greater than 1000) 
exist for most of these formulas. While nuXmv successfully checked all such models, 
Fig. 4 shows that increasing the interval range constraint results in segmentation faults; 
more than half of our benchmarks produced this outcome for formulas with allowed 
interval ranges of up to 600. Meanwhile, the solving solutions via LTL-SAT/LTL -SAT 
are definitely not competitive for any interval range. 
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The SMT-based approach dominates 1 
the model-checking-approaches when con- 
sidering scalable NB benchmarks, as 
shown in Fig. 5. Here, e.g., “BMC-1000” 
means using BMC to check the group 
of benchmarks with a maximal inter- 
val range of 1,000. Due to segmenta- 
tion faults, “BMC-1000” and “KLIVE- 
1000” have almost the same performance 
because the SMV models generated from 
our translator MLTLconverter are too 
large for nuXmv to handle. The perfor- 
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Fig. 4. Proportion of segmentation faults for 
mance of the model-checking approaches sets of 200 R formulas with maximal interval 
is constrained by the scalability of the ranges varying from 100 to 1000. 

model checker (nUXmv). However, the SMT encoding does not face such a bottle- 
neck; see “Z3-1000,” “Z3-10000,” and “Z3-100000” in Fig.5. We conclude that the 
SMT approach is the best available strategy for MLTL satisfiability checking. 
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Fig. 5. Cactus plot for BMC,KLIVE and SMT- 
solving approaches on the NB benchmarks; 
BMC and KLIVE overlap. 


Fig. 6. Scatter plot for both the BMC and 
KLIVE approaches to checking MLTLo formu- 
las ith/without encoding heuristics. 


Finally, we evaluated the performance of model-checking-based approaches on the 
RO formulas, observing that there is an exponential complexity gap between MLTL- 
SAT and MLTL,-SAT. Figure 6 compares the performance of satisfiability solving via 
the BMC and KLIVE approaches. There is no significant improvement when the SMV 
encoding heuristics for MLTLo are applied. For the BMC solving approach, perfor- 
mance is largely unaffected by encoding heuristics. For the KLIVE solving approach, 
encoding heuristics decrease solving performance. The results support the well-known 
phenomenon that the theoretic analysis and the practical evaluations do not always 
match. 
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We summarize with three conclusions. (1) For satisfiability checking of MLTL for- 
mulas, the new SMT-based approach is best. (2) For satisfiability checking of MLTL 
formulas with interval ranges less than 100, the MLTL-SAT via KLIVE approach is 
fastest. (3) The dedicated encoding heuristics for MLTLo do not significantly improve 
the satisfiability checking time of MLTLo-SAT over MLTL-SAT. They do not solve the 
nuXmv scalability problem. 


6 Discussion and Conclusion 


Metric Temporal Logic (MTL) was first introduced in [3], for describing continuous 
behaviors interpreted over infinite real-time traces. The later variants Metric Interval 
Temporal Logic (MITL) [5], and Bounded Metric Temporal Logic (BMTL) [30] are 
also interpreted over infinite traces. Intuitively, MLTL is a combination of MITL and 
BMTL that allows only bounded, discrete (over natural domain) intervals that are inter- 
preted over finite traces. There are several previous works on the satisfiability of MITL, 
though their tools only support the infinite semantics. Bounded satisfiability checking 
for MITL formulas is proposed in [33], and the reduction from MITL to LTL is pre- 
sented in [20]. Since previous works focus on MITL over infinite traces and there is no 
trivial way to reduce MLTL over finite traces to MITL over infinite traces, the previ- 
ous methodologies are not comparable to those presented in this paper. This includes 
the SMT-based solution of reducing MITL formulas to equi-satisfiable Constraint LTL 
formulas [8]. Compared to that, our new SMT-based approach more directly encodes 
MLTL formulas into the SMT language without translation through an intermediate 
language. 

The contribution of a complete, correct, and open-source MLTL satisfiability check- 
ing algorithm and tool opens up avenues for a myriad of future directions, as we have 
now made possible specification debugging MLTL formulas in design-time verifica- 
tion and benchmark generation for runtime verification. We plan to explore alternative 
encodings for improving the performance of MLTL satisfiability checking and work 
toward developing an optimized multi-encoding approach, following the style of the 
previous study for LTL [40]; the current SMT model generated from the MLTL formula 
uses a relatively simple theory (uninterpreted functions). We also plan to explore lazy 
encodings from MLTL formulas to SMT models. For example, instead of encoding the 
whole MLTL formula into a monolithic SMT model, we may be able to decrease overall 
satisfiability-solving time by encoding the MLTL formula in parts with dynamic order- 
ing similar to [15]. To make the output of SMT-based MLTL satisfiability checking 
more usable, we plan to investigate translations from the functions returned from Z3 
for satisfiable instances into more easily parsable satisfying assignments. 
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Abstract. Satisfiability Modulo Theories (SMT) solvers with support 
for the theory of strings have recently emerged as powerful tools for rea- 
soning about string-manipulating programs. However, due to the com- 
plex semantics of extended string functions, it is challenging to develop 
scalable solvers for the string constraints produced by program analysis 
tools. We identify several classes of simplification techniques that are 
critical for the efficient processing of string constraints in SMT solvers. 
These techniques can reduce the size and complexity of input constraints 
by reasoning about arithmetic entailment, multisets, and string contain- 
ment relationships over input terms. We provide experimental evidence 
that implementing them results in significant improvements over the per- 
formance of state-of-the-art SMT solvers for extended string constraints. 


1 Introduction 


Most programming languages support strings natively and a considerable num- 
ber of programs perform some form of string manipulation. Automated reason- 
ing about string-manipulating programs for verification and test case generation 
purposes is then highly relevant for these languages and programs. Applications 
to security, such as finding SQL injection and XSS vulnerabilities in web appli- 
cations [16,18,23] or proving their absence, are of critical importance. String 
constraints have also been used to generate relational database tables from SQL 
queries for unit testing purposes [21]. These applications require modeling all of 
the string operations that appear in real programs. This is challenging since some 
of those operations are complex and often realized by iterative applications of 
simpler operations. Additionally, since strings in many programming languages 
have variable length, reasoning accurately about them cannot be done by a reduc- 
tion to bounded types such as bit-vectors, and requires instead the development 
of solvers for unbounded strings. To make this type of reasoning more scalable, 
the use of dedicated theory solvers natively supporting common string opera- 
tions has been proposed [5,9]. Some string solvers are fully integrated within 
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Satisfiability Modulo Theories (SMT) solvers [4,12]; some are built (externally) 
on top of such solvers [9,16, 19]; and others are independent of SMT solvers [23]. 

A major challenge in developing solvers for unbounded string constraints is 
the complex semantics of extended string functions beyond the basic operations 
of string concatenation and equality. Extended functions include replace, which 
replaces a string in another string, and indexof, which returns the position of 
a string in another string. Another challenge is that constraints using extended 
functions are often combined with constraints over other theories, e.g. integer 
constraints over string lengths or applications of indexof, which requires the 
involvement of solvers for those theories. Current string solvers address these 
challenges by reducing constraints with extended string functions to typically 
more verbose constraints over basic functions. As with every reduction, some of 
the higher level structure of the problem may be lost, with negative repercussions 
on the performance and scalability. 

To address this issue, we have developed new techniques that reason about 
constraints with extended string operators before they are reduced to simpler 
ones. This analysis of complex terms can often eliminate the need for expen- 
sive reductions. The techniques are based on reasoning about relationships over 
strings with high-level abstractions, such as their arithmetic relationships (e.g., 
reasoning about their length), their string containment relationships, and their 
relationships as multisets of characters. We have implemented these techniques 
in cvc4, an SMT solver with native support for string reasoning. An experimen- 
tal evaluation with benchmarks from various applications shows that our new 
techniques allows CvcC4 to significantly outperform other state-of-the-art solvers 
that target extended string constraints. 

Our main contributions are: 


A novel procedure for proving entailments over arithmetic predicates built 

from the theory of strings and linear integer arithmetic. 

— Extensions of this technique for showing containment relationships between 
strings. 

— A novel simplification technique based on abstracting strings as multisets. 

— Experimental evidence that the simplification techniques provide significant 

performance improvements over current state-of-the-art solvers. 


In the remainder of this section, we discuss related work. In Sect. 2, 
we provide some background on the theory of strings and how solvers 
reduce extended functions. In Sects.3, 4 and 5, we describe, respectively, our 
arithmetic-based, containment-based, and multiset-based simplification tech- 
niques. Section 6 describes our implementation of those techniques, and Sect. 7 
presents our evaluation. 


Related Work. Various approaches to solving constraints over extended string 
functions have been proposed. Saxena et al. [16] showed that constraints from the 
symbolic execution of JavaScript code contain a significant number of extended 
string functions, which underlines their importance. Their approach translates 
string constraints to bit-vector constraints, similar to other approaches based on 
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bounded strings such as HAMPI [9]. Bjørner et al. [5] proposed native support 
for extended string operators in string solvers for scaling symbolic execution 
of .NET code. They reduce extended string functions to basic ones after get- 
ting bounds for string lengths from an integer solver. They also showed that 
constraints involving unbounded strings and replace are undecidable. PASS [11] 
reduces string constraints over extended functions to arrays. Z3-str and its suc- 
cessors [4, 24,25] reduce extended string functions to basic functions eagerly dur- 
ing preprocessing. S3 [18] reduces recursive functions such as replace incremen- 
tally by splitting and unfolding. Its successor S3P [19] refines this reduction 
by pruning the resulting subproblems for better performance. Cvc4 [3] reduces 
constraints with extended functions lazily and leverages context-dependent sim- 
plifications to simplify the reductions [15]. TRAU [1] reduces certain extended 
functions, such as replace, to context-free membership constraints. OSTRICH [7] 
implements a decision procedure for a subset of constraints that include extended 
string functions. The simplification techniques presented in this paper are agnos- 
tic to the underlying solving procedure, so they can be combined with all of these 
approaches. 


2 Preliminaries 


We work in the context of many-sorted first-order logic with equality and assume 
the reader is familiar with the notions of signature, term, literal, formula, and 
formal interpretation of formulas. We review a few relevant definitions in the 
following. A theory is a pair T = (X,I) where X is a signature and I is a class of 
5/-interpretations, the models of T. We assume X contains the equality predicate 
x, interpreted as the identity relation, and the predicates T (for true) and L 
(for false). A X-formula ¢ is satisfiable (resp., unsatisfiable) in T if it is satisfied 
by some (resp., no) interpretation in I. We write Fr y to denote that the 
\/-formula y is T-valid, i.e., is satisfied in every model of T. Two X-terms tı 
and tz are equivalent in T if =r ti ~ to. 

We consider an extended theory Ts of strings and length equations, whose 
signature Xs is given in Fig. 1 and whose models differ only on how they inter- 
pret variables.' We assume a fixed finite alphabet A of characters which includes 
the digits {0,...,9}. The signature includes the sorts Bool, Int, and Str denot- 
ing the Booleans, the integers (Z), and Kleene closure of A (A*), respectively. 
The top half of Fig. 1 includes the usual symbols of linear integer arithmetic, 
interpreted as expected, a string literal | for each word/string of A*, a variadic 
function symbol con, interpreted as word concatenation, and a function symbol 
len, interpreted as the word length function. We write e for the empty word and 
abbreviate len(s) as |s|. We use words over the characters a, b, and c, as in abca, 
as concrete examples of string literals. 

We refer to the function symbols in the bottom half of the figure as extended 
functions and refer to terms containing them as extended terms. A position in 


1 Our implementation supports a larger set of symbols, but for brevity, we only show 
the subset of the symbols used throughout this paper. 
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n:Int forallneN +: Int x Int > Int —:Int—> Int =: Int x Int > Bool 
l: Str forallle A* con: Str x- x Str > Str len : Str > Int 
substr : Str x Int x Int — Str contains: Str x Str > Bool 

indexof : Str x Str x Int > Int replace: Str x Str x Str — Str 
str.to.int : Str — Int int.to.str : Int > Str 


Fig. 1. Functions in signature Xs. Str and Int denote strings and integers respectively. 


a string | € A* is a non-negative integer n smaller than the length of l that 
identifies the (n + 1)*” character of |—with 0 identifying the first character, 1 
the second, and so on. For all models Z of Ts, all 1,1,,l2 € A*, and n,m € Z, 
su bstr (L, n,m) (the interpretation of substr in Z applied to l, n, m) is the longest 
substring of | starting at position n with length at most m, or € if n is an invalid 
position or m is not positive; contains? (l4, l2) is true if and only if lz is a substring 
of lı, with € being a substring of every string; indexof” (l1, 12,7) is the position 
of the first occurrence of lə in lı at or after position n, n if lz is empty and 
0<n< |h], and —1 if n is an invalid position, or if no such occurrence exists; 
replace” (1, lı, l2) is the result of replacing the first occurrence of l in l by lg, lifl 
does not contain l4, or the result of prepending I, to l if lı is empty; str.to.int? (1) 
is the non-negative integer represented by l in decimal notation or —1 if the 
string contains non-digit characters; int.to.str’ (n) is the result of converting n to 
the corresponding string in decimal notation if n is non-negative, or € otherwise. 
We write substr(t, u) as shorthand for the term substr(t, u, |t|), i.e. the suffix of t 
starting at position u. 

Note that the semantics for replace and indexof correspond to the semantics 
in the current draft of the SMT-LIB standard for the theory of strings [17]; they 
are slightly different from the ones described in previous work [4, 15, 20]. 


2.1 Solving Extended String Constraints (with Simplification) 


Various efficient solvers have been designed for the satisfiability problem for 
quantifier-free Ts-constraints, including Cvc4 [3], $34 [20] and z3sTR3 [4]. In 
this section, we give an overview of how these solvers process extended functions 
in practice. 

Generally speaking, constraints involving extended functions are converted to 
basic ones through a series of reductions performed in an incremental fashion by 
the solver. Operators whose reduction requires universal quantification are dealt 
with by guessing upper bounds on the lengths of input strings or by lazily adding 
constraints that block models that do not satisfy extended string constraints. 


Example 1. To determine the satisfiability of —contains(t, s), the application of 
contains is reduced to constraints that ensure that s is not a substring of t 
at any position. Assuming we have a fixed upper bound n on the length of 
t, the above constraint is equivalent to the finite conjunction substr(t, 0, |s|) # 
s A+++ A substr(t,n, |s|) % s. Each application of substr is then eliminated by 
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introducing an equality that constrains a fresh variable x; to have the semantics 
of that substring. Thus, reducing the formula above results in 


r post 


i 


pre 


n 
A [tl 2 at Isl = (ai # s^ ta con(a? Sta a let | ta Jeil © [al ) 
i=0 


where «;,7?"°, «°° are fresh string variables.? The above conjunction involves 
only string concatenation, string length, and equality, and thus can be handled 


by a string solver with support for word equations with length constraints. 


The reduction in Example 1 introduces 5-n theory literals over basic string 
functions and 3 -n string variables. A full reduction accounting for all corner 
cases of substr is even more complex and thus more expensive to process, even for 
small values of n. These performance challenges can be addressed by aggressive 
simplifications that eliminate extended functions using high-level reasoning, as 
shown in the next example. 


Example 2. Consider an instance of the previous example where s = con(a, x) 
and t = con(b, substr(z,0,n)). A full reduction of —contains(t, s) that eliminates 
all applications of substr, including those in t, introduces 5- n+ 5 new theory 
literals and 3-n+3 string variables. However, based on the semantics of contains 
it is easy to see that —contains(t,s) is Ts-valid: if t were to contain s, then s 
would have to occur in the portion of t after its first character b, since the first 
character of s is a. However, con(a, x) cannot be contained in substr(x, 0,7), since 
the length of the former is at least |x| + 1, while the length of the latter is at 
most |x|. A solver which recognizes that —contains(t, s) can be simplified to T 
in this case can avoid the reduction altogether. 


We advocate for aggressive simplification techniques to improve the perfor- 
mance of string solvers for extended functions. In the next sections, we describe 
several classes of such techniques that can be applied to inputs as a preprocess- 
ing step or during solving as part of a context-dependent solving strategy [15]. 
We present them as sets R of rewrite rules of the form t —>p s, where s is a 
(simplified) term equivalent to t in Ts. We assume a deterministic application 
strategy for these rules, such that each term t rewrites to a unique simplified 
form, denoted by t}, which is irreducible by the rules. We split our simplifica- 
tions into four categories, presented in Figs. 4, 6, 7 and 8.3 


3 Arithmetic-Based String Simplification 


To simplify string terms, it is useful to establish relationships between quantities 
such as the lengths of strings. For example, contains(t, s) can be simplified to L 


? This formula is a simplified form of the general reduction. The general reduction 
also expresses that i is a valid position in t and that the third argument of substr is 
non-negative [15]. 

3 Some specialized rules have been omitted for space reasons. 
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for a particular s and t if it can be inferred that |s| is strictly greater than |t]. 
This section defines an inference system for such arithmetic relationships and 
the simplifications that it enables. 

We are interested in proving the Ts-validity of formulas of the form u > 0, 
where u is a Xs-term of integer type. We describe an inference system as a set of 
rules for deriving judgments of the form F u > 0 and a specific rule application 
strategy we have implemented. The inference system is sound in the sense that 
=y, u > 0 whenever F u > 0 is derivable in it. It is, however, incomplete as it 
may fail to derive F u > 0 in some cases when =z, u > 0. This incompleteness 
is by design, since proving the Ts-validity of inequalities is generally expensive 
due to the NP-hardness of linear integer arithmetic. Without loss of generality, 
we require that the term u be in a simplified form, where terms of the form 
|| with 7 a string literal of n characters are rewritten to n, terms of the form 
|con(ti,...,¢n)| are rewritten to |t1|+---+|t,|, and like monomials in arithmetic 
terms are combined in the usual way (e.g., 2- |x| + |x| is rewritten to 3- |z|). 


Definition 1 (Polynomial Form). An arithmetic term u is in polynomial 
form if u = mi + Uy +...Mn+ Un +m, where M1,..., Mn are non-zero integer 
constants, m is an integer constant, and each uy,...,Un is a unique term and 
one of the following: 


an integer variable, 

an application of length to a string variable, e.g. |x|, 

an application of length to an extended function, e.g. |substr(t,v,w)|, or 
an application of an extended function of integer type, e.g. indexof(t, s, v). 


™ wh 


Given u in polynomial form, our inference system uses a set of over- and under- 
approximations for showing that u > 0 holds in all models of Ts. We define two 
auxiliary rewrite systems, denoted —>ọ and —y. If u rewrites to v (in zero or 
more steps) in —>ọ, written u —% v, we say that v is an over-approximation of 
u. We can prove in that case that =z, v > u. Dually, if u rewrites to v in >y, 
written u >% v, we say that v is an under-approximation of u and can prove 
that =z, u > v. Based on these definitions, the core of our inference system can 
be summarized by the single inference rule schema provided in Fig. 2 together 
with the conditional rewrite systems —o9 and —y which are defined inductively 
in terms of the inference system and each other. 

A majority of the rewrite rules have side conditions requiring the derivability 
of certain judgments in the same inference system. To improve their readability 
we take some liberties with the notation and write F u1 > ug, say, instead of 
H uy — u2 > 0. For example, |substr(t, v, w)| is under-approximated by w if it can 
be inferred that the interval from v to v+ w is a valid range of positions in string 
t, which is expressed by the side conditions F v > 0 and F |t| > v +w. Note that 
some arithmetic terms, such as |substr(t, v, w)|, can be approximated in multiple 
ways—hence the need for a strategy for choosing the best approximation for 
arithmetic string terms, described later. The rules for polynomials are written 
modulo associativity of + and state that a monomial m -v in them can be over- 
or under-approximated based on the sign of the coefficient m. For simplicity, 
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lt] -u 0 

w if }u>Oand + |t| >v+w 

|substr(t, v, w)| >u { haw oo tae el 
if iri > a 
|replace(t, s, r)| >u { i jsl if H |r| > |s| or F |r| > |t] 
lint.to.str(v)| >u 1 if Hvo 


indexof(t,s,v) >u —1 
str.to.int(t) >u —1 
m:v+u >y m-wtu 


|substr(t, v, w)| >o 


|substr(t, v, w)| >o 


jint.to.str(v)| >o 


indexof(t, s, v) >o 


{ 
|replace(t, s,r)| >o { 
{ 
{ 


ee fC Cee eae € 
a 
3 


m-:v+u >o m-w+u ifv—o wandm > Oorv >y wandm <0 


Fig. 2. Rules for arithmetic entailment based on under- and over-approximations com- 
puted for arithmetic terms containing extended string operators. We write t,s,r to 
denote string terms, u, u’, v, w to denote integer terms and m,n to denote integer con- 
stants. 


we silently assume in the figure that basic arithmetic simplifications are applied 
after each rewrite step to put the right-hand side in polynomial form. 


Example 3. Let u be |replace(x, aa, b)|. Because F |aa| > |b|, the first case of 
the over-approximation rule for replace applies, and we get that u >o |z|. This 
reflects that the result of replacing the first occurrence, if any, of aa in x with b 
is no longer than z. 


Example 4. Let u be the same as in the previous example and let v be —1 - u + 
2- |z|. Since u >o |x| and the coefficient of u in v is negative, we have that 
v >y —1: |x| +2- |z|, which simplifies to |x|; moreover, |x| >y 0. Thus, v >ý 0 
and so F v = 0. In other words, we can use the approximations to show that u 
is at most 2- |z]. 


3.1 A Strategy for Approximation 


The rewrite systems —>ọ and —y allow for many possible derivations. Thus, it is 
important to devise a strategy that is efficient and succeeds often in practice. We 
use a greedy rule application strategy that favors rule applications leading to the 
cancellation of monomials. For example, consider the term || — |substr(y, 0, |x])], 
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and observe that the subtrahend can be over-approximated either by |y| or by 
|x|. However, proving the Ts-validity of |x|—|substr(y, 0, |z])| > 0 with the former 
over-approximation is impossible since |a|—|y| > 0 does not hold in all models of 
Ts. In contrast, the latter approximation produces || — |x| > 0 which is trivially 
Ts-valid. 


STR-ARITH-APPROX(u), where u = Ux + Ue + Us + m and: 
-= Us =M] y t+... +My yn, 
— w = m{ -|z| +... +m - |£p 
— Us =M] vı +... + Mg ` Vg. 

for variables x1, ..., £p, Y1,- - - , Yn and extended terms v1, ... Uq: 

1. If q > 0, choose a v; and v? that maximize the following criteria (in descending order), 
where u’ = (u[m? -vi m7 vF]: 
(a) (Soundness) v; >u v? if m? > Oand v; >o v? if m? < 0; 
(b) (Avoids new terms) Minimizes the size of negcoeff(u’)\negcoeff(u); 
(c) (Cancels existing terms) Maximizes the size of negcoeff(u)\negcoeff(u’). 
Return u >y u’. 

2. If p > 0 and m§ > 0 for some j, return u >u (ufm$ - |x;| > 0])}. 


> 


Fig. 3. A greedy strategy for showing arithmetic entailments in the theory Ts. We 
write negcoeff (u) to denote the set of terms whose coefficient is negative in u. 


Recall that, given an arithmetic inequality u > 0, our goal is to find a reduc- 
tion u >% n where n is a non-negative constant. Our strategy for choosing which 
rule of —>y to apply to u is given in Fig.3. We decompose u into three parts: 
the portion uz consisting of a sum of integer variables, the portion ug consisting 
of a sum of lengths of string variables, and the remaining portion us which is a 
sum of monomials involving extended terms v1,...,U, as defined in Definition 1. 

Since there are multiple choices for how terms in us are approximated, the 
strategy focuses primarily on this portion. In particular, we apply an approx- 
imation for one of the terms v;, under-approximating or over-approximating 
depending on the sign of its coefficient, and replace the monomial in t by its 
corresponding approximation. The choice of v; and v? is based on maximizing 
the likelihood that the overall derivation will produce a non-negative constant. 

For a term u in polynomial form, let negcoeff(w) be a set of integer terms 
whose coefficient is negative in u, e.g. negcoeff(y, + —1 - y2) = {y2}. Terms in 
this set can be seen as obligations for proving entailments in our derivations 
since if yg E€ negcoeff(u), it must be the case that our derivation applies a rule 
that introduces a term with a positive coefficient for y2. In Fig. 3, we say that 
our choice of v; >y v? avoids new terms if it does not have the effect of adding 
any new terms to negcoeff(u), and cancels existing terms if it has the effect 
of removing terms from this set. If the portion u, is empty, we apply the rule 
|x;| >y 0 if there exists a monomial mi - |x;| where més is positive. This rule is 
applied with lowest priority because these monomials may help to cancel negative 
terms introduced by the other steps. 
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Step 1 depends on knowing the set of possible one-step approximations v; >y 
v? and v; >o vf for terms from u. These are determined using the rules of Fig. 2. 
Whenever applicable, we break ties between rewrites in Step 1 by considering a 
fixed arbitrary ordering over extended terms. 


Example 5. Let u be 1+ |t| + |t2| —|x1|, where tı is substr(xg, 1, |x| + |x4|) and 
t2 is replace(x1, 22,23). Step 1 of StR-ARITH-APPROX considers the possible 
approximations |tı| >y |#2| — 1 and |t2| > |x1| — |x2|. Note that under- 
approximations are needed because the coefficients of |t;| and |t2| are positive. 
The first approximation is an instance of the third rule in Fig. 2, noting that 
both F 1 > 0 and F 1 + |z| + |x4| > |x| are derivable by a basic strategy 
that, wherever applicable, under-approximates string length terms as zero. Our 
strategy chooses the first approximation since it introduces no new negative 
coefficient terms, thus obtaining: u >y |x2| + |te| — |ai|. We now choose the 
approximation |t2| >y |x| — |r|, noting that it introduces no new negative 
coefficient terms and cancels an existing one, |a,|. After arithmetic simplification, 
we have derived u 7, 0, and hence F u > 0. 


One can show that our strategy is sound, terminating, and deterministic. This 
means that applying STR-ARITH-APPROX to completion produces a unique 
rewrite chain of the form t >y uy >y ... >u Un for a finite n, where each 
step is an application of one of the rewrite rules from Fig. 2. 


3.2 Simplification Rules with Arithmetic Side Conditions 


We use the inference system from the previous section for simplifications of string 
terms with arithmetic side conditions. Figure 4 summarizes those simplifications. 

The first rule rewrites a string equality to L if one of the two sides can be 
inferred to be strictly longer than the other. In the second rule, if one side of 
an equality, con(s,r,q), is such that the sum of lengths of s and q alone can be 
shown to be greater than or equal to the length of the other side, then r must 
be empty. The third rule recognizes that string containment reduces to string 


txs—> L if H |t| > |s|+1 
tx con(s,r,q) > ta con(s,q) Arae if H |s| + |q| > |t| 
contains(t,s) > t~s if H |s| > |t| 

E a € if KO>vuvvuel|t|l vOzS 
substr(con(t, s),v,w) = substr(s,v — |t|, w) if Hv |e 
substr(con(s,t),v,w) — substr(s, v, w) if  |s| >v+w 
substr(con(t, s),0,w) > con(t, substr(s, 0, w — |t|)) if H} w > |t| 

indexof (t =e) —  ite(substr(t,v) ~ s,v,—1) if Hut |s| > |t| 


Fig. 4. String simplification rules. Letters t,s,r,q denote string terms; v,w denote 
integer terms. 
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equality when it can be inferred that string s is at least as long as the string t 
that must contain it. The next rule captures the fact that substring simplifies to 
the empty string if it can be inferred that its position v is not within bounds, or 
its length w is not positive. In the figure, we write that rule with a disjunctive 
side condition; this is a shorthand to denote that we can pick any disjunct and 
show that it holds assuming the negation of the other disjuncts. We can use those 
assumptions to perform substitutions to simplify the derivation. Concretely, to 
show F uj > u2 v ... v u % uw’ it is sufficient to infer F (uy > ug)[ur wu’). We 
demonstrate this with an example. 


Example 6. Consider the term substr(¢, |t| + w, w). Our rules may simplify this 
term to e by inferring that its start position (|t| + w) is not within the bounds 
of t if we assume that its size (w) is positive. In detail, assume that w > 0 (the 
negation of the last disjunct in the side condition of the fourth rule), which is 
equivalent to w ~ |x| + 1 where x is a fresh string variable and || denotes an 
unknown non-negative quantity. It is sufficient to derive the formula obtained by 
replacing all occurrences of w by |x| +1 in the disjunct |t| + w > |t| to show that 
the start position of our term is out of bounds. After simplification, we obtain 
|x| + 1 > 0, which is trivial to derive. 


The next two rules in Fig.4 apply if we can infer respectively that the start 
position of the substring comes strictly after a prefix t or that the end position 
of the substring comes strictly before a suffix t of the first argument string. In 
either case, t can be dropped. 


Example 7. Let t be substr(con(a1, replace(x2, £3, £4)),0, w), where w is |a1| — 
|v2|. We have that t > substr(x1,0, w), noting that F |a,| > 0 + |a,| — |x|. In 
other words, only the first component xı of the string concatenation is relevant 
to the substring since its end point must occur before the end of 2}. 


The final rule for substr shows that a prefix of a substring can be pulled upwards 
if the start position is zero and we can infer that the substring is guaranteed to 
include at least a prefix string t. Finally, if we can infer that the last position of s 
in t starting from position v is at or beyond the end of t, then the indexof term can 
be rewritten as an if-then-else (ite) term that checks whether s is a suffix of t. 


4 Containment-Based String Simplification 


This section provides an overview of simplifications that are based on reasoning 
about the containment relationship between strings. We describe an inference 
system for deriving when one string is definitely contained or not contained in 
another. Following the notation from the last section, we write F t 3 s to denote 
the judgment of our inference system, denoting that string t contains string s in 
all models of Ts. Conversely, we write F t  s to denote string t does not contain 
string s. We write | t 3P s (resp., F t 3° s) to denote the judgment indicating 
that s must be a prefix (resp., suffix) of t. 
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lı contains l2 Hsr Ftər Hs q 
H con(l1, t) 3 l2 H con(t,s) ar H con(t, s) 3 con(r, q) H t 3 substr(t, v, w) 


1, does not contain l2 Fr#t Ll,\loft 
H l A con(l2, t) H r $ con(s,t) H lı # con(l2, t) 
l2 is a prefix of lı sər Ku<0 
= con(l, t) 2 l2 - con(t, s) 3” con(t,r) Htt A t 3” substr(t, v, w) 
l2 is a suffix of lı sar Hv+w e |t| 
h con(t, l1) 3° l2 H con(s, t) 3° con(r, t) tət A t 3° substr(t, v, w) 


Fig. 5. Inferences for string containment 3, is-prefix 3” and is-suffix 3°. 


Rules for inferring judgments of these forms are given in Fig. 5. Like our 
rules for arithmetic, these rules are solely based on the syntactic structure 
of terms, so inferences in this system can be computed statically. Both the 
assumptions and conclusions of the rules assume associativity of string concate- 
nation with identity element €, that is, con(t, s) may refer to a term of the form 
con(con(t1, t2), s) = con(t1,t2,s) or alternatively to con(e,s) = s. Most of the 
rules are straightforward. The inference system has special rules for substring 
terms substr(t,v,w), using arithmetic entailments from Sect.3 to show prefix 
and suffix relationships with the base string t. For negative containment, the 
rules of the inference system together can show a (possibly non-constant) string 
cannot occur in a constant string by reasoning that its characters cannot appear 
in order in that string. We write lı \ l2 to denote the empty string if lı does 
not contain l2, or the result of removing the smallest prefix of lı that contains 
lg from lı otherwise. 


Example 8. Let t be abcab and let s be con(b, x,a, y,c). String s is not contained 
in t for any value of x, y. We derive t # s using two applications of the rightmost 
rule for negative containment in Fig. 5, noting abcab \ b = cab, cab \ a = b, and 
b does not contain c. In other words, the containment does not hold since the 
characters b, a and c cannot be found in order in the constant abcad. 


4.1 Simplification Rules Based on String Containment 


Figure 6 gives rules for simplifying extended function terms based on the afore- 
mentioned judgments pertaining to string containment. First, equalities can be 
rewritten to false and applications of contains can be rewritten to a constant 
based on the appropriate judgment of our inference system. Applications of 
indexof can be simplified to —1 if it can be shown that the second argument 
does not appear in the suffix of the first argument starting at the position given 
by the third argument. The next two rules reason about cases where the second 
argument s definitely occurs in the first argument starting from position v. In 
this case, if we additionally know that s occurs within (beyond) a prefix t of 
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eso l if KL} tds 
contains a s)> L if Hts 
contains(t,s) > T if H təs 
indexof(t,s,v) > —1 if H substr(t, v) ds 
indexof(con(t,r),s,v) > indexof(t, s, v) if H substr(t,v) 3s 
indexof(con(t,r),s,v) > indexof(r,s,v — |t|) + |t| if H substr(con(t,r),v) 3 s and 
Lu > |t| 
indexof(t,s,v) > v if H substr(t, v) 3? sand + v < |t| 
replace(t,s,r) > t if Hts 
replace(con(t,q),s,r) — con(replace(t,s,r),q) if Ktas 
replace(t,s,r) — con(r, substr(¢, |s|)) if Hts? s 


Fig. 6. Simplification rules based on string containment. 


the first argument, then the suffix r (prefix t) can be dropped, where the start 
position and the return value of the result are modified accordingly. If we know 
s is a prefix of the first argument at position v, then the result is v if indeed 
v is in the bounds of t. Notice that the latter condition is necessary to handle 
the case where s is the empty string. The three rules for replace are analogous. 
First, the replace rewrites to the first argument if we know it does not contain 
the second argument s. If we know s is definitely contained in a prefix of the 
first argument, then we can pull the remainder of that string upwards. Finally, 
if we know s is a prefix of the first argument, then we can replace that pre- 
fix with r while concatenating the remainder. We use the term substr(¢, |s|) to 
denote the remainder after the replacement for the sake of brevity, although 
this term typically does not involve extended functions after simplification, e.g. 
replace(con(x, y),2,z) — con(z,y) noting that (substr(con(a, y), |æ|))} = y, or 
replace(ab, a,x) — con(x, b) noting that (substr(ab, |a]))| = b. 


4.2 Simplifications Based on Equivalence of String Containment 


We further refine our approach based on inferring when one containment is 
equivalent to another one. For example, con(a, x) is contained in con(b, y) if and 
only if con(a, x) is contained in y alone. We introduce simplifications for such 
equivalences by reasoning about the maximal overlap between two strings. 

We adapt and extend the notation given in previous work [15]. Given string 
literals lı and l2, the sufficient left overlap of lı and lg, written lı u; l2, is the 
largest suffix of lı that is a prefix of lọ or has lọ as a prefix. For example, we 
have abc ùu; cd = c, abc ùu; b = bc, and abc ùu; ba = e. We extend this definition 
to arbitrary strings s such that lı u; s is equivalent to lı u; l2 for the largest 
constant prefix lz of s, where notice that l2 is the empty string if s does not have a 
constant prefix. For example, we have abc.i;con(cde, y) = c, abc; con(b, y) = be, 
and abc u; con(a, y) = abc. We define the dual operator sufficient right overlap, 
written lı U, l2, which is the largest prefix of l4 that is a suffix of lọ or has Ig as 
a suffix, e.g. abc ur b = ab, and extend this to arbitrary strings in an analogous 
way. The sufficient left (resp., right) overlap operator can be used to determine 
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how much of a constant string prefix lı (resp., suffix) can be safely removed from 
a string without impacting whether it contains another string. 


contains(con(t, l), s) 
contains(con(l, t), s) 
indexof(con(t, l), s, v) 
indexof(con(I, t), s, v) 


contains(con(t, l u+ s), s) 

contains(con(l u; s, t), s) 

indexof (con(t, l ur s), s, v) 

indexof (con(l2, t), s,v — ||) if l= h -l2 andl? =l u; s 
+|] H substr(con(I,t),v) 3 s 
con(replace(con(t, l1), s, r), l2) if L= l -l2 andlı =l ur s 
con(l1, replace(con(l2,t),s,r)) if L= l -l2 and l2 = l u; s 


oer 


replace(con(t, l), s,7) 
replace(con(l, t), s,7) 


44 


Fig. 7. Simplification rules based on equivalence of string containment. We write L, l1, le 
to denote string literals, v, w to denote integer terms and t, s to denote string terms. 


The rules in Fig.7 simplify extended terms by considering string overlaps. 
The first two rules drop parts of string literals from the suffix or prefix of their 
first arguments. The two rules for indexof are similar: a suffix of the first argu- 
ment can be dropped if it does not contribute to whether it contains the second 
argument. A prefix of an indexof term can be dropped if it does not contribute 
to containment, but only in the case where we know the second argument is def- 
initely contained in the first argument. This is to guard against the case where 
the entire indexof term returns —1. The rules for replace are similar to those 
for contains, except that the suffix (resp., prefix) of the first argument is pulled 
upwards instead of being dropped. 


5 Multiset-Based String Simplification 


Next, we introduce simplifications based on reasoning about strings as mul- 
tisets, i.e. collections of unordered characters. Such reasoning is sufficient for 
showing that equalities like con(a,x) ~ con(x,b) are equivalent to L, since the 
left side of the equality contains exactly one more occurrence of character a 
than the right-hand side. Similar to arithmetic reasoning from Sect.3, we use 
approximations when reasoning about strings as multisets. We define the multi- 
set abstraction of t, written Mz, as the multiset {t1,...,t,} where t is equiva- 
lent to con(t;,...,¢,) and all constants in this set are characters. For example, 
Mecon(aba,z) = {a,a, b,x}. We define a rewrite system <1 over strings where a 
rewritten string over-approximates the original string in the following sense: if 
t i s, then for all models of Ts and any character c, the number of occurrences 
of c in the strings in M, is greater than or equal to the number of occurrences 
in the strings in M+. 

Figure8 lists the rules for the rewrite system i and the simplifica- 
tions based on multiset reasoning. Given a pee contains(t, s), if over- 
approximating t with respect to the rules of > —M results in a string r, and 
it can be determined that s contains strictly more occurrences of some character 


36 A. Reynolds et al. 


c than r, then it cannot be the case that s is contained in t. To establish this, we 
check whether the multiset difference of M, and M, contains c, and conversely 
the difference of M, and M, contains only character constants which are dis- 
tinct from c. In the second rule, if one side of an equality can be determined to 
contain only a character c, then one occurrence of that character can be dropped 
from both sides of the equality, since the relative position of that character does 
not matter. The three rules for +2 state that the multiset abstraction of a 
term of the form substr(t, v, w) can be over-approximated as the entire string t; 
a term replace(t, s, r) can be over-approximated as a string having both t and r; 
and over-approximation can be applied to the children of con terms. 


contains(t, s) > L if t 3! *r, 
M;\M, = {c, s1,- -, Sn} and 
Mr\Ms = {c1, TE sEm 
con(t, c, s) ~ con(q,c, r) — con(t, s) ~ con(q, r) if Meon(t,c,s) 0! *p and 
Mp = {c,..-,¢} 
substr(t, v, w) > t 
where replace(t, s,r) >! con(t,r) 
con(t, s,r) >& con(t,q,r) ifs >& q 
Fig. 8. Simplification rules based on multiset reasoning. We write c,c1,... to denote 
characters, v, w to denote integer terms, and t, s,r,q,p to denote string terms. 


Example 9. We have that con(aaa, substr(x, y1, y2)) ~ con(a,b) > L by noting 
that con(aaa, substr(a, y1,y2)) >t *con(aaa, x), Meon(aaa,z) = {a,a,a, £} and 
Meon(x,b) = {b,x}. The difference of the latter with the former is {b}, and the 
former with the latter is {a,a,a}. Thus, the right side of the equality contains at 
least one more occurrence of b than the left side; hence, the equality is equivalent 


to false. 


6 Implementation 


We implemented the above simplification rules and others in the DPLL-based 
SMT solver Cvc4, which implements a theory solver for a basic fragment of word 
equations with length, several other theory solvers, and reduction techniques 
for extended string functions as described in Sect. 2.1. Our simplification rules 
are run in a preprocessing pass as well as an inprocessing pass during solving. 
For the latter, we use a context-dependent simplification strategy that infers 
when an extended string constraint, e.g., ~contains(t, s), simplifies to L based 
on other assertions, e.g., s ~ €. Our simplification techniques do not affect the 
core procedure for the theory of strings, nor the compatibility of the string solver 
with other theories. In total, our implementation is about 3,500 lines of C++ 
code. We cache the results of the simplifications and the approximation-based 
arithmetic entailments to amortize their costs. 
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Additional Simplification Rules. The simplification rules in this paper are a 
subset of the rules in the implementation. We omit other uncategorized rules for 
lack of space. Many of these apply to specific term patterns, such as cases where 
two nested applications of substr can be combined; cases where an application of 
replace can be eliminated by case splitting; and other cases like con(t, t) ~ a > L. 
An example of such rules is contains(replace(t, w1, w2), w3) — contains(t, w3) if 
w3 does not overlap with either wı or w2, because the replace does not change 
whether t contains w3 or not. Another class of rules only applies to strings of 
length one because they cannot span multiple components of a concatenations, 
e.g. contains(con(t, s),c) — contains(t, c) v contains(s,c) where c is a character. 
Finally, there are rewrites that benefit from multiple techniques presented in this 
paper. For example, we have a rewrite that splits string equations into multiple 
smaller equations if it can determine that prefixes must have the same length: 
con(a, t, s) ~ con(t, b, 7) > con(a,t) ~ con(t,b) Asx r— L. 


Validating Simplification Rules. The correctness of our simplification tech- 
niques is critical to the soundness of the overall solver. Due to the sophistication 
and breadth of those techniques, it is challenging to formally verify our imple- 
mentation. As a pragmatic alternative, we periodically test our implementation 
using a testing infrastructure we developed for this purpose. We found this to be 
critical in our development process. Our testing infrastructure allows the devel- 
oper to specify a context-free grammar in the syntax-guided synthesis format [2]. 
We generate all terms ¢ in this grammar up to a fixed size and test the equiva- 
lence of t and its simplified form t| on a set of randomly generated points. The 
most recent run of this system on two grammars (one for extended string terms 
and another for string predicates) up to a term size of three, validated 319,867 
simplifications of string terms and 188,428 simplifications of string predicates on 
1,000 sample points. This run took 924s for string terms and 971s for the string 
predicates using the same hardware as in Sect. 7. 


7 Evaluation 


We evaluate the impact of each simplification technique as implemented in Cvc4 
on three benchmark sets that use extended string operators: CMU, a dataset 
obtained from symbolic execution of Python code [15]; TERMEQ, a benchmark 
set consisting of the verification of term equivalences over strings [14]; and SLOG, 
a benchmark set extracted from vulnerability testing of web applications [22]. 
The SLOG set uses the replace function extensively but does not contain other 
extended functions. We also evaluate the impact on APLAS, a set of handcrafted 
benchmarks involving looping word equations [10] (string equalities whose left 
and right sides have variables in common). 

We compare cvc4 with z3 commit 9cb1a0f [8],* a state-of-the-art string 
solver. Additionally, we compare against OSTRICH on the SLOG benchmarks but 
not other sets because it does not support some functions such as contains and 
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4 9cb1a0f is newer than the current release 4.8.4 and includes several fixes for critical 
issues. 
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indexof. We omit a comparison with Z3STR3 4.8.4 because we found multiple 
issues in its latest release including wrong answers, which we have reported to 
the authors. We also omit a comparison with s3# due to differing semantics. We 
compare four configurations of Cvc4: all, which enables all optimizations; -arith, 
which disables arithmetic-based simplification techniques (discussed in Sect. 3); 
-contain, which disables containment-based simplification techniques (discussed 
in Sect. 4); and -msets, which disables multiset-based simplification techniques 
(discussed in Sect.5). Additionally, to test the applicability of our techniques to 
other solvers, we test the effect of our simplifications on Z3 by using Cvc4 to 
generate simplified benchmarks and then running Z3 on those benchmarks. We 
generate a set of simplified benchmarks that are simplified with cvc4 with (z3) 
and without (Z3,) the simplification techniques presented in this paper. 


Table 1. Number of solved problems per benchmark set. Best results are in bold. Gray 
cells indicate benchmark sets not supported by a solver. “R%” indicates the reduction 
of extended string functions during preprocessing. All benchmarks ran with a timeout 
of 600s. 


Set all -arith -contain -msets Z3 -Z3y = Z3¢ OSTRICH R% 
sat 5703 5535 5703 5703 2343 3923 3943 
CMU unsat 65 29 65 65 50 58 61 32% 
x 154 358 154 154 3529 1941 1918 
sat 10 10 10 10 4 5 5 
TERMEQ unsat 51 37 28 51 35 40 60 68% 
x 19 33 42 19 41 35 15 
sat 1302 1302 1302 1302 1133 1225 1225 1304 
SLOG unsat 2082 2082 2082 2082 2080 2080 2080 2082 27% 
x 7 7 7 7 178 86 86 5 
sat 135 135 135 135 9 51 46 
APLAS unsat 292 292 171 171 94 129 292 n/a 
x 159 159 280 280 483 406 248 
sat 7150 6982 7150 7150 3489 5204 5219 1304 
Total unsat 2490 2440 2346 2369 2259 2307 2493 2082 
x 339 557 483 460 4231 2468 2267 5 


We ran all benchmarks on a cluster equipped with Intel E5-2637 v4 CPUs 
running Ubuntu 16.04 and dedicated one core, 8GB RAM, and 600s for each 
job. Table 1 summarizes the number of solved instances for each configuration 
and the baseline solvers grouped by benchmark sets. We remark that the aver- 
age reduction of extended string functions (with all simplification techniques 
enabled) shown in column “R%” is significant on all benchmark sets. The scat- 
ter plots in Fig.9 detail the effects of disabling each family of simplifications. 
They distinguish between satisfiable and unsatisfiable instances. To emphasize 
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Fig. 9. Scatter plots showing the impact of disabling simplification techniques in cvc4 
on both satisfiable and unsatisfiable benchmarks. All benchmarks ran with a timeout 
of 600s. 


non-trivial benchmarks, we omit the benchmarks that are solved in less than a 
second by all solvers. 

The arithmetic-based simplification techniques have the most significant per- 
formance impact on the symbolic execution benchmarks CMU. The number of 
solved benchmarks is significantly lower when disabling those techniques. The 
scatter plot shows that for longer running satisfiable queries there is a large por- 
tion of the benchmarks that are solved up to an order of magnitude faster with 
the simplifications. These improvements in runtime on the CMU set are par- 
ticularly compelling because they come from a symbolic execution application, 
which involves a large number of queries with a short timeout. The improvements 
are more pronounced for unsatisfiable benchmarks, where our results show that 
simplifications often give the solver the ability to derive a refutation in a mat- 
ter of seconds, something that is infeasible with configurations without these 
techniques. The APLAS set contains no extended string operators and hence our 
arithmetic-based simplification techniques have little impact on this set. 

In contrast, both containment and multiset-based rewrites have a high impact 
on the APLAS set, as -contain and -msets both solve 121 fewer benchmarks. 
Additionally, -contain has a high impact on the TERMEQ set, where the sim- 
plifications enable the best configuration to solve 61 out of 80 benchmarks. 
Since these techniques apply most frequently to looping word equations, they 
are less important for the CMU set, which does not have such equations. The 
containment-based and multiset-based techniques primarily help on unsatisfiable 
benchmarks, as shown in the scatter plots. On TERMEQ benchmarks, it tends 
to be easier to find counterexamples, i.e. to solve the satisfiable ones, so there is 
more to gain on unsatisfiable benchmarks. 

On SLOG, OSTRICH solves two more instances than Cvc4 but CVvC4 is over 50 
times faster on commonly solved instances while supporting a richer set of string 
operators. On all benchmark sets, Cvc4 solves at least as many benchmarks as 
Z3 and Cvc4 has 12x fewer timeouts than Z3. On the simplified benchmarks, z3 
performs significantly better. On the CMU and the APLAS benchmarks, Z3, out- 
performs Z3 by a large margin. Additionally simplifying the benchmarks with 
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the techniques presented in this paper improves performance further on most 
benchmark sets and allows 23, to solve the most unsatisfiable benchmarks over- 
all. These results indicate that Z3 could benefit from additional simplifications, 
and they underscore the importance of curating and publishing simplification 
techniques in order to improve the state-of-the-art. 


8 Conclusion 


We have presented a set of aggressive simplification techniques for reasoning 
about extended string constraints. Our results suggest that such techniques are 
key to advancing the state of the art in SMT string solving. Arithmetic-based 
simplifications lead to significant speedups in benchmarks from a symbolic execu- 
tion application, while containment and multiset-based simplifications improve 
the performance on problems consisting of difficult term equivalences and loop- 
ing word equations. Our approach is not limited to cvc4 and can be adapted to 
other solvers. 

Given the encouraging results for each of the simplification techniques in our 
evaluation, we plan to extend them to other types of abstraction and make them 
context-aware. The latter extension involves taking into account other assertions 
when checking whether a side condition of a rule is fulfilled. 
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Abstract. We introduce first-order alternating automata, a general- 
ization of boolean alternating automata, in which transition rules are 
described by multisorted first-order formulae, with states and internal 
variables given by uninterpreted predicate terms. The model is closed 
under union, intersection and complement, and its emptiness problem 
is undecidable, even for the simplest data theory of equality. To cope 
with the undecidability problem, we develop an abstraction refinement 
semi-algorithm based on lazy annotation of the symbolic execution paths 
with interpolants, obtained by applying (i) quantifier elimination with 
witness term generation and (ii) Lyndon interpolation in the quantifier- 
free theory of the data domain, with uninterpreted predicate symbols. 
This provides a method for checking inclusion of timed and finite-memory 
register automata, and emptiness of quantified predicate automata, pre- 
viously used in the verification of parameterized concurrent programs, 
composed of replicated threads, with shared memory. 


1 Introduction 


Many results in automata theory rely on the finite alphabet hypothesis, which 
guarantees, in some cases, the existence of determinization, complementation 
and inclusion checking methods. However, this hypothesis prevents the use of 
automata as models of real-time systems or even simple programs, whose input 
and output are data values ranging over very large domains, typically viewed as 
infinite mathematical abstractions. 

Traditional attempts to generalize classical Rabin-Scott automata to infinite 
alphabets, such as timed automata [1] and finite-ememory automata [16] face 
the complement closure problem: there exist automata for which the comple- 
ment language cannot be recognized by an automaton in the same class. This 
makes it impossible to encode a language inclusion problem £(A) C £(B) as 
the emptiness of an automaton recognizing the language L(A) N £L°(B), where 
L£°(B) denotes the complement of £(B). 

Even for finite alphabets, complementation of finite-state automata faces 
inherent exponential blowup, due to nondeterminism. However, if we allow uni- 
versal nondeterminism, in addition to the classical existential nondeterminism, 
complementation is possible is linear time. Having both existential and univer- 
sal nondeterminism defines the alternating automata model [4]. A finite-alphabet 
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alternating automaton is described by a set of transition rules q > ¢, where q 
is a state, a is an input symbol and ¢ is a boolean formula, whose propositional 
variables denote successor states. 

Our Contribution. We extend alternating automata to infinite data alphabets, 
by defining a model of computation in which all boolean operations, including 
complementation, can be done in linear time. The control states are given by k- 
ary predicate symbols q(y1,. . -, Yk), the input consists of an event a from a finite 
alphabet and a tuple of data variables z1, ..., £n, ranging over an infinite domain, 
and transitions are of the form q(y1,.--,Yp) Æ Q(x,- En Ys es Yk) 
where ¢ is a formula in the first-order theory of the data domain. In this model, 
the arguments of a predicate atom q(yi,..., Yk) represent the values of the inter- 
nal variables associated with the state. Together with the input values 71,...,2n, 
these values define the next configurations, but remain invisible in the input 
sequence. 

The tight coupling of internal values and control states, by means of unin- 
terpreted predicate symbols, allows for linear-time complementation just as in 
the case of classical propositional alternating automata. Complementation is, 
moreover, possible when the transition formulae contain first-order quantifiers, 
generating infinitely-branching execution trees. The price to be paid for this 
expressivity is that emptiness of first-order alternating automata is undecidable, 
even for the simplest data theory of equality [6]. 

The main contribution of this paper is an effective emptiness checking semi- 
algorithm for first-order alternating automata, in the spirit of the IMPACT lazy 
annotation procedure, originally developed for checking safety of nondetermin- 
istic integer programs [20,21]. In a nutshell, a lazy annotation procedure unfolds 
an automaton A trying to find an execution that recognizes a word from £(A). 
If a path that reaches a final state does not correspond to a concrete run of 
the automaton, the positions on the path are labeled with interpolants from the 
proof of infeasibility, thus marking this path and all continuations as infeasible 
for future searches. Termination of lazy annotation procedures is not guaran- 
teed, but having a suitable coverage relation between the nodes of the search 
tree may ensure convergence of many real-life examples. However, applying lazy 
annotation to first-order alternating automata faces two nontrivial problems: 


1. Quantified transition rules make it hard, if not impossible, in general, to 
decide if a path is infeasible. This is mainly because adding uninterpreted 
predicate symbols to decidable first-order theories, such as Presburger arith- 
metic, results in undecidability [10]. To deal with this problem, we assume 
that the first-order data theory, without uninterpreted predicate symbols, has 
a quantifier elimination procedure, that instantiates quantifiers with effec- 
tively computable witness terms. 

2. The interpolants that prove the infeasibility of a path are not local, as they 
may refer to input values encountered in the past. However, the future exe- 
cutions are oblivious to when these values have been seen in the past and 
depend only on the relation between the past and current values. We use this 
fact to define a labeling of nodes, visited by the lazy annotation procedure, 
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with conjunctions of existentially quantified interpolants combining predicate 
atoms with data constraints. 


We use first-order alternating automata to develop practical semi-algorithms 
for a number of known undecidable problems, such as: inclusion of regular timed 
languages [1], inclusion of quasi-regular languages recognized by finite-memory 
automata [16] and emptiness of predicate automata, a subclass of first-order 
alternating automata used to verify parameterized concurrent programs [6,7]. 
Related Work. Recognizers for languages over infinite alphabets have found 
various applications, ranging from Unicode text recognition [5] to runtime pro- 
gram monitoring [2]. Extending finite automata to infinite alphabets has been 
considered in the context of symbolic alternating finite automata (s-AFA), whose 
transitions are labeled with guards taken from a decidable theory of the data 
domain [5]. As in our model, s-AFA are closed under union, intersection and 
complement and emptiness is decidable, due to the lack of registers. However, 
s-AFA are strictly less expressive than our model, because comparing data at 
different positions in the input word is not possible. 

Constrained Horn clauses (CHC) are a branching computation model 
widespread in program verification [9]. The main difference between alternat- 
ing and bottom-up branching computations is that, in an alternating model, all 
branches of the computation must synchronize on the same input word. With this 
in mind, it is possible to express emptiness of first-order alternating automata as 
the existence of solutions of a CHC over a higher-order theory of data, extended 
with algebraic data types (lists). The effectiveness of such an encoding depends 
on the effectiveness of interpolation and witness term generation for theories of 
algebraic data types [11]. 

The alternating automata model presented in this paper extends the alter- 
nating automata with variables ranging over infinite data considered in [14]. 
There all variables were required to be observable in the input. We overcome 
this restriction by allowing internal (invisible) variables. Another closely related 
work [13] considers an inclusion between an asynchronous product of automata 
A, X... X Ap, extended with data variables, and a monitor automaton B. The 
semi-algorithm defined there was based on the assumption that all variables of 
the observer B must be declared in the automata Aj,...,A, under check. This 
limitation can now be bypassed, since the inclusion problem can be encoded as 
emptiness of a first-order alternating automaton and, moreover, the emptiness 
checking semi-algorithm can handle invisible variables. 

The work probably closest to ours concerns the model of predicate automata 
(PA) [6,7,17], used in the verification of parameterized concurrent programs 
with shared memory. In this model, the alphabet consists of pairs of program 
statements and thread identifiers and is considered infinite because the num- 
ber of threads is unbounded. Since thread identifiers can only be compared for 
equality, the data theory in PA is the theory of equality. Even with this simplifica- 
tion, the emptiness problem is undecidable when either the predicates have arity 
greater than one [6] or use quantified transition rules [17]. Checking emptiness 
of quantifier-free PA is possible semi-algorithmically, by explicitly enumerating 
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reachable configurations and checking coverage by looking for permutations of 
argument values. However, no semi-algorithm has been given for quantified PA. 
Dealing with quantified transition rules is one of our contributions. 


1.1 Preliminaries 


For two integers 0 < i < j, we define [i, j] = {i,..., j} and [i] = [0, i]. We consider 
two disjoint sorts D and B, where D is an infinite domain and B = {T, L} is the 
set of boolean values true (T) and false (L), respectively. The D sort is equipped 
with countably many function symbols f : D#/) > DUB, where #(f) > 0 
denotes the number of arguments (arity) of f. A predicate is a function symbol 
p: D#®) — B that is, a #(p)-ary relation. 

We consider the interpretation of all function symbols f : D#/) — D to be 
fixed by the interpretation of the D sort, for instance if D is the set of integers 
Z, these are zero, the successor function and the arithmetic operations of addi- 
tion and multiplication. We extend this convention to several predicates over D, 
such as the inequality relation over Z, and write Pred for the set of remaining 
uninterpreted predicates. 

Let Var = {x,y,z,...} be a countably infinite set of variables, ranging 
over D. Terms are either constants of sort D, variables or function applications 
f(ti,.--,t4p)), where ti,...,t4p) are terms. The set of first-order formulae is 
defined by the syntax below: 


$ :=t= s| plti,- t4) |91 | 1 A G2 | Fx . Gr 


where t, 8,t1,..-,¢(p) denote terms and p is a predicate symbol. We write 1 V 
P2, Q1 > dg and Yz . dı for =(>¢1 An¢2), 761 V $2 and 7Ax . ~o, respectively. 
FV (¢) is the set of free variables in ¢ and the size |¢| of a formula ¢ is the 
number of symbols needed to write it down. A sentence is a formula ¢ with 
no free variables. A formula is positive if each uninterpreted predicate symbol 
occurs under an even number of negations and we denote by Form? (Q, X) the 
set of positive formulae with predicates from the set Q C Pred and free variables 
from the set X C Var. A formula is in prenez form if it is of the form yp = 
Qızı... Qn£n . >, where @ has no quantifiers. In this case we call ¢ the matrix 
of y. Every first-order formula can be written in prenex form, by renaming each 
quantified variable to a unique name and moving the quantifiers upfront. 

An interpretation I maps each predicate symbol p into a set p? C D#), if 
#(p) > 0, or into an element of B if #(p) = 0. A valuation v maps each variable 
x into an element of D. Given a term t, we denote by t” the value obtained 
by replacing each variable x by the value v(x) and evaluating each function 
application. For a formula ¢, we define the forcing relation J, v = @ recursively 
on the structure of ¢, as usual. For a formula ¢ and a valuation v, we define 
[ol], = {Z| Z,v H ¢} and drop the v subscript for sentences. A sentence ¢ is 
satisfiable if Jọ] # 0. An element of [¢] is called a model of ¢. A formula ¢ is 
valid if I,v  ¢ for every interpretation J and every valuation v. We say that 
@ entails w, written d — w if and only if [¢] C fy]. 
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Interpretations are partially ordered by the pointwise subset order, defined 
as I, C I if and only if p’! C p’? for each predicate symbol p € Pred. Given a 
formula ¢ and a valuation v, we define [o] = {7 | I,v ¢, VI’ CI. T',v Ko} 
the set of minimal interpretations that, together with v, form models of ¢. 


2 First Order Alternating Automata 


Let X be a finite alphabet X of input events. Given a finite set of variables 
X C Var, we denote by X +> D the set of valuations of the variables X and 
X[X] = £ x (X +} D) be the possibly infinite set of data symbols (a, v), where a 
is an input symbol and v is a valuation. A data word (simply called word in the 
following) is a finite sequence w = (@1,11)(d2, V2)... (an, Vn) of data symbols. 


Given a word w, we denote by ws * a, ...dn its sequence of input events and 
by wp the valuation associating each time-stamped variable x, where x € Var, 
the value v;(x), for all i € [1,n]. We denote by £ the empty sequence, by X* the 
set of finite input sequences and by X[X]* the set of finite data words over the 
variables X. 

A first-order alternating automaton is a tuple A = (X, X, Q, |, F, A), where X 
is a finite set of input events, X is a finite set of input variables, Q is a finite set of 
predicates denoting control states, 1 € Form*(Q,() is a sentence defining initial 
configurations, F C Q is the set of predicates denoting final states and A is a set 
of transition rules. A transition rule is of the form q(y1,---, ¥4(q)) 200, w, where 
q € Qisapredicate, a € Xis an input event and Y € Form*(Q, XU{y1,..., Y#(q) t) 
is a positive formula, where X N {y1,.--,¥#(q)} = 9. Without loss of generality, 
we consider, for each predicate q € Q and each input event a € X, at most one such 
rule, as two or more rules can be joined using disjunction. The quantifiers occurring 
in the right-hand side formula of a transition rule are called transition quantifiers. 
The size of A is |A| Ž |e] + P{W | ay) “2 we A}. 

The semantics of first-order alternating automata is analogous to the seman- 
tics of propositional alternating automata, with rules of the form q > ¢, where 
q is a propositional variable and ¢ a positive boolean combination of proposi- 
tional variables. For instance, go = (qı A q2) V q3 means that the automaton can 
choose to transition in either both qı and q2 or in q3 alone. This leads to defining 
transitions as the minimal models of the right hand side of a rulet. The original 
definition of alternating automata [4] works around this problem and considers 
boolean valuations instead of formulae. In contrast, a finite description of a first- 
order alternating automaton cannot be given in terms of interpretations, as a 
first-order formula may have infinitely many models, corresponding to infinitely 
many initial or successor states occurring within an execution step. 

Given an uninterpreted predicate symbol q € Q and data values 
d,,...,d#(q) E D, the tuple (q,d1,...,d4(q)) is called a configuration, some- 
times written q(di,...,d4(q)), when no confusion arises. A configuration is 


1 Both {qı T, q2 T, q3 l} and {q1 L, qe L, q3 T} are minimal 
models, however {q1 T, qo 193 T} is a model but is not minimal. 
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final if q € F. An interpretation J corresponds to a set of configurations 
c(Z) = {(4,d1,..- dy) |4 E€ Q, (ines sag) € q}, called a cube. This 
notation is lifted to sets of configurations in the usual way. 


Definition 1. Given a word w = (a1,%)..-(@n,%m) E X|X]* and a cube c, 
an execution of A = (X, X,Q,1, F, A) over w, starting with c, is a forest T = 
{T,,T2,...}, where each T; is a tree labeled with configurations, such that: 


1. c={T(e) | T ET} is the set of configurations labeling the roots of T,,Ta,... 


and 
2. if (q,d1,...,d4(q)) labels a node on the level j € [n—1] in T;, then the labels of 
its children form a cube from e([v]>), where n = vjyilyı — di,- -YE — 


aj41(X) ; sas 

dq] and q(y1,---,Y¥#(q)) ——— Y E A is a transition rule of A. 

An execution T over w, starting with c, is accepting if and only if all paths 
in T have the same length and the frontier of each tree T € T is labeled with 
final configurations. If A has an accepting execution over w starting with a cube 
c € c([u]"), then A accepts w and let L(A) be the set of words accepted by A. For 
example, consider the automaton A = ({a}, {x}, {q0, q1, 42, ar}, Go(0), {qf}, A), 
where A is the set: go(y) > q(y+2) Agly — £), aly) > alyt+2)V(y> 
OA qs) and q2(y) “©, gly — x) V (y > 0A qf). A possible execution tree of this 
automaton is the following: 


a{x<— 1} a,{x — 2} a,{x — 3} a,{x— 4} a,{x — 5} 
a D3) (6) 110) ay 
(qo,0) 


q2,-1) —— (q2,-3) —— (q2,-6) ——> (q, -10) ——>(q@2,-15) 


The execution tree is not accepting, since its frontier is not labeled with final 
configurations everywhere. Incidentally, here we have L(A) = 0, which is proved 
by our tool in ~0.5s on an average machine. 

In the rest of this paper, we are concerned with the following problems: 


1. boolean closure: given automata A; = (X, X, Qi, ti, Fi, Ai), for i = 1,2, do 
there exist automata An, Ay and A; such that L(An) = L(A) N L(A2), 
L(Av) = L(A) U L(A) and L(A) = YIX \ LUA)? 

2. emptiness: given an automaton A, is L(A) = 0? 


For technical reasons, we address the following problem next: given an automaton 
A and an input sequence a € X*, does there exists a word w € L(A) such that 
ws = a? By solving this problem first, we develop the machinery required to 
prove that first-order alternating automata are closed under complement and, 
further, set up the ground for developping a practical semi-algorithm for the 
emptiness problem. 


2.1 Path Formulae 


In the upcoming developments it is sometimes more convenient to work with 
logical formulae defining executions of automata, than with low-level execution 
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forests. For this reason, we first introduce path formulae O(a), which are for- 
mulae defining the executions of an automaton, over words that share a given 
sequence a of input events. Second, we restrict a path formula O(a) to an 
acceptance formula Y(a), which defines only those executions that are accepting 
among O(a). Consequently, the automaton accepts a word w such that ws = a 
if and only if Y(q) is satisfiable. 

Let A = (X, X, Q, F, A) be an automaton for the rest of this section. For 
any i € N, we denote by Q® = {q® |q E€ Q} and X® = {a | x E€ X} the sets 
of time-stamped predicate symbols and variables, respectively. We also define 
Qs” Z {q® |q EQ, i€ [n]} and XE S {x® |x € X,i € [n]}. For a formula 
w and i € N, we define 7) © Y| X®/X,Q® /Q] the formula in which all input 
variables and state predicates (and only those symbols) are replaced by their 
time-stamped counterparts. Moreover, we write q(y) for q(yi,..-,Y#(q)), when 
no confusion arises. 

Given a sequence of input events @ = a1 . . . An E X*, the path formula of a is: 


O(a) SOANA a00, yea Un WH sg? Oy) = Be 1) 
Vy 


The automaton A, to which O(a) refers, will always be clear from the context. 
To formalize the relation between the low-level configuration-based execution 
semantics and path formulae, consider a word w = (a1, v1)... (an, Vn) E€ X[X]*. 
Any execution T of A over w has an associated interpretation Jy of time- 
stamped predicates Q‘S”: 


Ir (q®) © {(di,.. -,d(q)) | (q, d1,---;44(q)) labels a node on level i in T}, Yq € Q Vi € [n] 


Lemma 1. Given an automaton A = (X, X,Q,t, F, A), for any word w = (a1, 
vi)... (an, Vn), we have [O(ws)]f,, = {Zr |T is an execution of A over w}. 


Next, we give a logical characterization of acceptance, relative to a given 
sequence of input events a € X*. To this end, we constrain the path formula 
O(a) by requiring that only final states of A occur on the last level of the 
execution. The result is the acceptance formula for a: 


Y(a) = O(a) A Nacor Vyr +» Wuxi -a (y) > L (2) 


The top-level universal quantifiers from a subformula Vy; . . . Yyg(q) - 4P (y) > Y 
of T (a) will be referred to as path quantifiers, in the following. Notice that path 
quantifiers are distinct from the transition quantifiers that occur within a formula 
w of a transition rule q(y1,.--, ¥#(q)) 2°. w of A. The relation between the 
words accepted by A and the acceptance formula above, is formally captured by 
the following lemma: 


Lemma 2. Given an automaton A = (2',X,Q,,F, A), for every word w € 
X[|X]*, the following are equivalent: (1) there exists an interpretation T such 
that T, wp = Y(ws) and (2) w € L(A). 
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As an immediate consequence, one can decide whether A accepts some word 
w with a given input sequence wy = a, by checking whether Y (a) is satisfiable. 
However, unlike non-alternating infinite-state models of computation, such as 
counter automata (nondeterministic programs with integer variables), the satis- 
fiability query for an acceptance (path) formula falls outside of known decidable 
theories, supported by standard SMT solvers. There are basically two reasons 
for this, namely (i) the presence of predicate symbols, and (ii) the non-trivial 
alternation of quantifiers. To understand this point, consider for example, the 
decidable theory of Presburger arithmetic [24]. Adding even only one monadic 
predicate symbol to it yields undecidability in the presence of non-trivial quan- 
tifier alternation [10]. On the other hand, the quantifier-free fragment of Pres- 
burger arithmetic extended with uninterpreted function symbols is decidable, by 
a Nelson-Oppen style congruence closure argument [22]. 

To tackle the problem of deciding satisfiability of Y (a) formulae, we start 
from the observation that their form is rather particular, which allows the elim- 
ination of path quantifiers and uninterpreted predicate symbols, by a couple of 
satisfiability-preserving transformations. The result of applying these transfor- 
mations is a formula with no predicate symbols, whose only quantifiers are those 
introduced by the transition rules of the automaton. Next, in Sect. 3 we shall 
assume moreover that the first-order theory of the data sort D (without uninter- 
preted predicate symbols) has quantifier elimination, providing thus an effective 
decision procedure. 

For the time being, let us formally define the elimination of transition quan- 
tifiers and predicate symbols. Let a = a,...a, be a given sequence of input 
events and let a; be the prefix a1...a; of a, for i € [n], where ap = e. We 


consider the sequence of formulae O(a), eas O(an) defined as O(a) =. and, 
for all i € [1,n], let O(a;) be the conjunction of @(a;_1) with all formulae 
q?» (ti, are , tq) — yo [t1/y1, ea stla / YHA such that gq’? (ti, TE stla) 
occurs in O(ai-1), for some terms ty,...,t4(q). Next, we write Tla) for the con- 
junction of O(a) with all g®(t1,...,t4(q)) > L, such that ¢g@(t1,...,t4(4)) 
occurs in O(a), for some q € Q\ F. Note that F(a) contains no path quantifiers, 
as required. On the other hand, the scope of the transition quantifiers in Tla) 
exceeds the right-hand side formulae from the transition rules, as shown by the 
following example. 


Example 1. Consider the automaton A = ({a1,a2}, {x}, {q, qf}, i {qf}, A), 
where: 


t=4dz.2>0Adq(z) 
A= {qly) 2 r> 0AYz.z <y> ql +2), ay) 2s y < 0A g(a +y)} 


For the input event sequence @ = a1a2, the acceptance formula is: 


Y(a) = 3z . 21 > 0Aq®(z21) A 
Vy . q® (y) [a > 0 AYz2 . z2 > y > qP (1® + z2)] A 
Vy . q® (y) > [y < 0 Ags? (x® + y)] 
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The result of eliminating the path quantifiers, in prenex normal form, is shown 
below: 


Tla) = Jz Vz . z1 > 0 Aqg® (z1) A 
[gO (z1) > @™ SOA (z2 > 21 > qP (1® + 22))] A 
[gE (al + 22) => r + z2 < 0A qf” (£ +r ++ z2)] 


Notice that the transition quantifiers dz; and Vz2 from Y(a) range now over 
F(a). = 


Lemma 3. For any input event sequence a = a...an and each valuation v : 
X‘S™ — D, the following hold, for every interpretation I: (1) if I,v = Y(a) 
then I,v = Y(a), and (2) if I,v F(a) there exists an interpretation J C T 
such that J,v = Y(a). 


Further, we eliminate the predicate atoms from T(a), by considering the 
sequence of formulae (ag) = u and O(a;) is obtained by substituting each 
predicate atom q°~"(t1,...,t4(q)) in O(a;i-1) by W[ti/y,.-- stu(q)/Y#q@)), 
where q(y) => y € A, for all i € [1,n]. We write Y(a) for the formula 
obtained by replacing, in O(a), each occurrence of a predicate q™, such that 


q E Q\ F (resp. q € F), by L (resp. T). 


Example 2 (Contd. from Example 1). The result of the elimination of predicate 
atoms from the acceptance formula in Example 1 is shown below: 


Y(a) = Anz. 21 > 0A [x > 0A (22 > 21 => & + 29 < 0)) 
Since this formula is unsatisfiable, by Lemma5 below, no word w with input 
event sequence ws = a a2 is accepted by the automaton A from Example 1. W 


At this point, we prove the formal relation between the satisfiability of the 
formulae Y(a) and Y(qa). Since there are no occurrences of predicates in Y(a), 
for each valuation v : X‘S”) — D, there exists an interpretation J such that 
I v = Y(q) if and only if J,v H Y(a), for every interpretation J. In this case 
we omit J and simply write v H Y(a). 


Lemma 4. For any input event sequence a = a...an and each valuation v : 
X‘S™ — D, there exists a valuation I such that I,v = Y(a) if and only if 
v = T(a). 


Finally, we define the acceptance of a word with a given input event sequence 
by means of a quantifier-free formula in which no predicate atom occurs. 


Lemma 5. Given an automaton A = (X, X,Q,1, F, A), for every word w € 
X|X]*, we have wp = Y(ws) if and only if w E€ L(A). 
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2.2 Boolean Closure of First Order Alternating Automata 
Given a positive formula ¢, we define the dual formula ¢~ recursively as follows: 


= s)~ Sts 
#3)" Srs 


(hi V b2)~ = b1~ A b2™ (Pi A b2)~ = b1~ V pa~ ( 
(Se. o1)~ E Va. or (Va. p)“ S 3a. gi ( 


qlz,- Eel) E dlar,- Eel) 


t 
t 


The following theorem shows closure of automata under all boolean operations. 
Note that it is sufficient to show closure under intersection and negation because 
L£(A1) U L(A2) is the complement of the language L (A1) O L°(Az), for any 
two automata A, and Az with the same input event alphabet and set of input 
variables. 


Theorem 1. Given automata A; = (X, X, Qi, li, Fi, Ai), fori = 1,2, such that 
Qı N Qə = Í, the following hold: 


1. £(An) = LCA) N L(A2), where An = (X, X, Qı U Q2, A t2, Fi U Fp, 
Ai U 42), E 

2. L(A;) = S[X]\ LA), where A; = (X, X, Qi, t~, Qi \ Fi, AY) and Ay = 
{q(y) ==> Uta she Ad, prina 


Moreover, |An| = O(|Ai| + |Az|) and |A;| = O(\Ail|), for i = 1,2. 


3 The Emptiness Problem 


The emptiness problem is undecidable even for automata with predicates of arity 
two, whose transition rules use only equalities and disequalities, having no tran- 
sition quantifiers [6]. Since even such simple classes of alternating automata have 
no general decision procedure for emptiness, we use an abstraction-refinement 
semi-algorithm based on lazy annotation [20,21]. In a nutshell, a lazy annota- 
tion procedure systematically explores the set of finite input event sequences 
searching for an accepting execution. For an input sequence, if the path formula 
is satisfiable, we compute a word in the language of the automaton, from the 
model of the path formula. Otherwise, i.e. the sequence is spurious, the search 
backtracks and each position in the sequence is annotated with an interpolant, 
thus marking the sequence as infeasible. The semi-algorithm uses moreover a 
coverage relation between sequences, ensuring that the continuations of already 
covered sequences are never explored. Sometimes this coverage relation provides 
a sound termination argument, in case when the automaton is empty. 

For two input event sequences a,3 E€ X*, we say that a is a prefix of 8, 
written a < 8, if a = Gy for some sequence y E€ X*. A set S of sequences is 
prefix-closed if for each a € S, if 8 < a then G € S, and complete if for each 
a € S, there exists a € X such that aa € S if and only if ab € S for all b € X. 
A prefix-closed set is the backbone of a tree whose edges are labeled with input 
events. If the set is, moreover, complete, then every node of the tree has either 
zero successors, in which case it is called a leaf, or it has a successor edge labeled 
with a for each input event a € X. 
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Definition 2. An unfolding of an automaton A = (27,X,Q,t,F,A) is a 
finite partial mapping U : &* fin Form*(Q,0), whose domain dom(U) is 
a finite prefix-closed complete set, such that U(e) = 1, and for each sequence 
aa € dom(U), such that a € &* anda € X: 


U(a) AN a(x) Vyr.. VYgq - q9 (y) = POR U (aa)™ 
a(y)—> Y 


A path a is safe in U if and only if U(a) A A\geqyr VY ++ -Yuga - Uy) > L is 
unsatisfiable. The unfolding U is safe if and only if every path in dom(U) is safe 


in U. 


Lazy annotation semi-algorithms [20,21] build unfoldings of automata trying 
to discover counterexamples for emptiness. If the automaton A in question is 
non-empty, a systematic enumeration of the input event sequences? from X* will 
suffice to discover a word w € L(A), provided that the first-order theory of the 
data domain D is decidable (Lemma 2). However, if L(A) = Ø, the enumeration 
of input event sequences may, in principle, run forever. The typical way of fighting 
this divergence problem is to define a coverage relation between the nodes of the 
unfolding tree. 


Definition 3. Given an unfolding U of an automaton A = (X, X,Q,t,F, A) a 
node a € dom(U) is covered by another node B € dom(U), denoted a E B, if 
and only if there exists a node a’ < a such that U(a’) — U(B). Moreover, U is 
closed if and only if every leaf from dom(U) is covered by an uncovered node. 


A lazy annotation semi-algorithm will stop and report emptiness provided 
that it succeeds in building a closed and safe unfolding of the automaton. Notice 
that, by Definition 3, for any three nodes of an unfolding U, say a, 8, y € dom(UV), 
ifa < Gand a E y, then 8 E y as well. As we show next (Theorem 2), there 
is no need to expand covered nodes, because, intuitively, there exists a word 
w E€ L(A) such that a < wy anda E y only if there exists another word 
u E L(A) such that y < uy. Hence, exploring only those input event sequences 
that are continuations of y (and ignoring those of a) suffices in order to find a 
counterexample for emptiness, if one exists. 

An unfolding node a € dom(U) is said to be spurious if and only if Y(a) is 
unsatisfiable. In this case, we change (refine) the labels of (some of the) prefixes 
of a (and that of a), such that U (a) becomes L, thus indicating that there is no 
real execution of the automaton along that input event sequence. As a result of 
the change of labels, if a node y < a used to cover another node from dom(U), 
it might not cover it with the new label. Therefore, the coverage relation has to 
be recomputed after each refinement of the labeling. The semi-algorithm stops 
when (and if) a safe complete unfolding has been found. 


Theorem 2. If an automaton A has a nonempty safe closed unfolding then 


£(A) =b. 


? For instance, using breadth-first search. 
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Algorithm 1. IMPACT-based Semi-algorithm for First Order Alternating 
Automata 
input: a first order alternating automaton A = (X, X, Q, 4, F, A) 
output: T if L(A) = 0, or word w € L(A), otherwise 
data structures: WorkList and unfolding tree U = (N, E,r,U, <), where: 
— N is a set of nodes, 
—~ EC Nx Xx N is a set of edges labeled by input events, 
- U : N — Form™(Q,9) is a labeling of nodes with positive sentences 
—~ < C N x N is a coverage relation, 
initially WorkList = {r} and N = E = U = 4 = Í. 
while WorkList # do 
dequeue n from WorkList 


N- NU{n} 
let a(n) be ai,..., ax 
if T(a)(X™,..., X) is satisfiable then > counterexample is feasible 
get model v of Y(a)(X,..., X) 
return w = (a1,v(X™))... (ax, v(X™)) > w € L(A) by construction 
else > spurious counterexample 
let (Io,..., In) be a GLI for a 
be Ll 


for i=0,...,k do 
if U(ni) - I; then 
Uncover — {m E N | (m, ni) € <} 
< + <\ {(m, ni) | m € Uncover} > uncover the nodes covered by n; 
for m € Uncover such that m is a leaf of U do 


el a S s S a S a a d a a d a 
OCHAAATRAYNESOPIMARYN 


enqueue m into WorkList > reactivate uncovered leaves 
U(ni) — U (ni) A Ji > strenghten the label of n; (Lemma 7) 
if ~b then 
b — CLOSE(n:) 
20: if n is not covered then 
21; for a € X do > expand n 
22: let s be a fresh node and e = (n,a,s) be a new edge 
23: E< EU {fe} 
24: U—UU{(s, T)} 
25: enqueue s into WorkList 
26: return T 


27: function CLOSE(x) returns B 
28: for y € N such that a(y) <* a(x) do 


29: if U(x) = U(y) then 
30: < [< \ {(p,q) € < | q is x or a successor of x}] U {(2, y)} 
31: return T 


32: return | 


We describe the semi-algorithm used to check emptiness of first-order alter- 
nating automata. The execution of Algorithm 1 consists of three phases, corre- 
sponding to the CLOSE, REFINE and EXPAND of the original IMPACT procedure 
[20]. Let n be a node removed from the worklist at line 2 and let a(n) be the input 
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sequence labeling the path from the root node to n. If Y(a(n)) is satisfiable, the 
sequence a(n) is feasible, in which case a model of Y(a(n)) is obtained and a 
word w € L(A) is returned. Otherwise, a(n) is an infeasible input sequence and 
the procedure enters the refinement phase (lines 9-19). The GLI for a(n) is used 
to strenghten the labels of all the ancestors of n, by conjoining the formulae of 
the interpolant, changed according to Lemma’, to the existing labels. 

In this process, the nodes on the path between r and n, including n, might 
become eligible for coverage, therefore we attempt to close each ancestor of n 
that is impacted by the refinement (line 19). Observe that, in this case the call 
to CLOSE must uncover each node which is covered by a successor of n (line 30 
of the CLOSE function). This is required because, due to the over-approximation 
of the sets of reachable configurations, the covering relation is not transitive, as 
explained in [20]. If CLOSE adds a covering edge (n;, m) to <, it does not have to 
be called for the successors of n; on this path, which is handled via the boolean 
flag b. Finally, if n is still uncovered (it has not been previously covered during 
the refinement phase) we expand n (lines 21-25) by creating a new node for each 
successor s via the input event a € X and inserting it into the worklist. 


4 Interpolant Generation 


Typically, when checking the unreachability of a set of program configurations, 
the interpolants used to annotate the unfolded control structure are assertions 
about the values of the program variables in a given control state, at a certain 
step of an execution [20]. Because we consider alternating computation trees 
(forests), we must distinguish between (i) locality of interpolants w.r.t. a given 
control state (control locality) and (ii) locality w.r.t. a given time stamp (time 
locality). In logical terms, control-local interpolants are formulae involving a 
single predicate symbol, whereas time-local interpolants involve only predicates 
q® and variables z“, for a single i > 0. When considering alternating executions, 
control-local interpolants are not always enough to prove emptiness, because of 
the synchronization of several branches of the computation on the same input 
word. For this reason, the interpolants considered in this paper will never be 
control-local and we shall use the term local to denote time-local interpolants, 
with no free variables. 

First, let us give the formal definition of the class of interpolants we shall 
work with. Given a formula ¢, the vocabulary of ¢, denoted V(¢@) is the set of 
predicate symbols q € Q® and variables x € X™, occurring in ¢, for some 
i > 0. For a term t, its vocabulary V(t) is the set of variables that occur in t. 
Observe that quantified variables and the interpreted function symbols of the 
data theory? do not belong to the vocabulary of a formula. By P*(¢) [P7 (¢)] we 
denote the set of predicate symbols that occur in ¢ under an even [odd] number 
of negations. 


3 E.g., the arithmetic operators of addition and multiplication, when D is the set of 
integers. 
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Definition 4 ([19]). Given formulae ¢ and w such that ¢ ^% is unsatisfiable, 
a Lyndon interpolant is a formula I such that @ = I, the formula I A\ w is 
unsatisfiable, V(I) C V(¢)NV(w), PT (1) C Pt (¢)NPt(wW) and P- (I) C P7(A)N 
P- (1). 

In the rest of this section, fix an automaton A = (2’,X,Q,v,F, A). The 


following definition generalizes interpolants from unsatisfiable conjunctions to 
input sequences: 


Definition 5. Given a sequence of input events a = a,...dn E X*, a gen- 
eralized Lyndon interpolant (GLI) is a sequence (Ip,...,In) of formulae such 
that, for all k € |n — 1], the following hold: (1) P~(U,) = 0, (2) & = 


Io, Ik A (A yiye Vyr- Wyn) - 4P (y) > yor | = Iti and (3) 


In A Nacor V1- VYgla) . aly) > L is unsatisfiable. Moreover, the GLI is 
local if and only if V(I) C Q™, for all k € [nl]. 


The following proposition states the existence of local GLI for the theories in 
which Lyndon’s Interpolation Theorem holds. 


Proposition 1. If there exists a Lyndon interpolant for any two formulae ¢ and 
y, in the first-order theory of data with uninterpreted predicate symbols, such that 
oAw is unsatisfiable, then any sequence of input events a= a1 ... an E X*, such 
that Y(a) is unsatisfiable, has a local GLI (Ip,...,In). 


A problematic point of the above proposition is that the existence of Lyn- 
don interpolants (Definition4) is proved in principle, but the proof is non- 
constructive. In other words, the proof of Proposition 1 does not yield an algo- 
rithm for computing GLIs, for the following reason. Building an interpolant for 
an unsatisfiable conjunction of formulae ¢ A w is typically the job of the deci- 
sion procedure that proves the unsatisfiability and, in general, there is no such 
procedure, when ¢ and w contain predicates and have non-trivial quantifier alter- 
nation. In this case, some provers use instantiation heuristics for the universal 
quantifiers that are sufficient for proving unsatisfiability, however these heuris- 
tics are not always suitable for interpolant generation. Consequently, from now 
on, we assume the existence of an effective Lyndon interpolation procedure only 
for decidable theories, such as the quantifier-free linear (integer) arithmetic with 
uninterpreted functions (UFLIA, UFLRA, etc.) [26]. 

This is where the predicate-free path formulae (defined in Sect. 2.1) come 
into play. Recall that, for a given event sequence a, the automaton A accepts a 
word w such that wy = a if and only if Y (a) is satisfiable (Lemma5). Assuming 
further that the equality and interpreted predicates (e.g. inequalities for integers) 
atoms from the transition rules of A belong to a decidable first-order theory, 
such as Presburger arithmetic, Lemma 5 gives us an effective way of checking 
emptiness of A, relative to a given event sequence. However, this method does 
not cope well with lazy annotation, because there is no way to extract, from 
the unsatisfiability proof of Y(a), the interpolants needed to annotate a. This is 


Alternating Automata Modulo First Order Theories 57 


because (I) the formula Y(a), obtained by repeated substitutions loses track of 
the steps of the execution, and (II) quantifiers that occur nested in Y(a) make it 
difficult to write Y(a) as an unsatisfiable quantifier-free conjunction of formulae 
from which interpolants are extracted (Definition 4). 

The solution we adopt for the first issue (I) consists in partially recovering 
the time-stamped structure of the acceptance formula Y(a) using the formula 
Tla), in which only transition quantifiers occur. The second issue (II) is solved 
under the additional assuption that the theory of the data domain D has witness- 
producing quantifier elimination. More precisely, we assume that, for each for- 
mula dz . d(x), there exists an effectively computable term 7, in which x does 
not occur, such that dz . ¢ and ¢[r/a] are equisatisfiable. These terms, called 
witness terms in the following, are actual definitions of the Skolem function 
symbols from the following folklore theorem: 


Theorem 3 ((3]). Given Qit1...Qntn . $ a first-order sentence, where 
def 


Q1,---;Qn € {3, V} and ¢ is quantifier-free, let m = filyi,---,Yn;) Qi = V and 
ni = x; if Qi =, where fi is a fresh function symbol and {y1,.--, Yk: } = {z3 | 


j <i, Q; = 3}. Then the entailment Qiti...Qnin . Q =| O[m/21,---,%m/2n| 
holds. 


Examples of witness-producing quantifier elimination procedures can be found 
in the literature for e.g. linear integer (real) arithmetic (LIA,LRA), Presburger 
arithmetic and boolean algebra of sets and Presburger cardinality constraints 
(BAPA) [18]. 

Under the assumption that witness terms can be effectively built, we describe 
the generation of a non-local GLI for a given input event sequence a = a1... Qn. 
First, we generate successively the acceptance formula Y(a) and its equisatis- 
fiable forms F(a) = Q121... Qm8m . $ and T(a) = Qi21...Qmtm . &, both 
written in prenex form, with matrices @ and ®, respectively. Because we assumed 
that the first order theory of D has quantifier elimination, the satisfiability prob- 
lem for Y(a) is decidable. If Y(q) is satisfiable, we build a counterexample for 
emptiness w such that ws = a and wp is a satisfying assignment for Y(a). 
Otherwise, Y(q) is unsatisfiable and there exist witness terms Ti, ...7;,, where 
{i1,...,ie} = {7 € [1,m] | Q; = Y}, such that [7;,/2:,,...,7;,/xi,] is unsatis- 
fiable (Theorem 3). Then it turns out that the formula P[r;, £i, neta Tip Wiyls 
obtained analogously from the matrix of (a), is unsatisfiable as well 
(Lemma6 below). Because this latter formula is structured as a conjunction 
of formulae uo A 1... A dn A Y, where Vok) N QS” C QF- uU Q™ and 
Vw) N QE” C Q™, it is now possible to use an existing interpolation pro- 
cedure for the quantifier-free theory of D, extended with uninterpreted func- 
tion symbols, to compute a (not necessarily local) GLI (Io, ..., In) such that 
V(I) NRS CQ, for all k € [n]. 


Example 3 (Contd. from Examples 1 and2). The formula Y(a@) (Example 2) is 
unsatisfiable and let Tə £ zı be the witness term for the universally quantified 
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variable z2. Replacing z2 with Tə (z1) in the matrix of Tla) (Example 1) yields 
the unsatisfiable conjunction below, obtained after trivial simplifications: 


[a SONG (21)] A [gO (21) +e > 0Aq®(2® + 21)] A 
[qO (a@® +21) a + z1 < OA G7 (12 +2 + z1)] 


A non-local GLI for the above conjunction is the sequence of formulae: 
(g®(z1)Az > 0, e& > OAg®(e+21)Az > 0, L) "A 


We formalize and prove the correctness for the above construction of non- 
local GLI. A function € : N — N is monotonic iff for each n < m we have 
Eln) < €(m) and finite-range iff for each n € N the set {m | E(m) = n} is finite. 
If € is finite-range, we denote by én}, (n) € N the maximal value m such that 


sm) =n. 


Lemma 6. Given a non-empty input event sequence @ = a1 ... an € 2", such 
that Y(a) is unsatisfiable, let Qızı... QmEm - ® be a prener ian of Tla ) and 
let £ : [1, m] — [n] be a monotonic finite-range function mapping each transition 
quantifier to the minimal index from the sequence O(a), ...,6lan) where it 
occurs. Then one can effectively build: 


1. witness terms T;,,...,Ti,, where {i1,... ie} = {7 € [Lim] | Q; = V} 
and V(t) C XSD U {zp | k < ij,Qk = 3}, Vj € [L, 4] such that 
D|Ti /Li,,---,Ti,/Li,] is unsatisfiable, and 


2. a GLI (Io,..., In) for a, such that V(I) C Q® U XS” U {z; | j < 
Emax(k), Qj = J}, for all k € [n]. 


Consequently, under two assumptions about the first-order theory of the 
data domain, namely (i) witness-producing quantifier elimination, and (ii) Lyn- 
don interpolation for the quantifier-free fragment with uninterpreted functions, 
we developed a generic method that produces GLIs for unfeasible input event 
sequences. Moreover, each formula in the interpolant refers only to the current 
predicate symbols, the current and past input variables and the existentially 
quantified transition variables introduced at the previous steps. The remaining 
questions are how to use these GLIs to label the sequences in the unfolding of 
an automaton (Definition 2) and compute coverage (Definition 3) between nodes 
of the unfolding. 


4.1 Unfolding with Non-local Interpolants 


As required by Definition2, the unfolding U of an automaton A = 
(X, X,Q,v, F, A) is labeled by formulae U(a) € Form*(Q, Ø), with no free sym- 
bols, other than predicate symbols, such that the labeling is compatible with 
the transition relation of the automaton. Each newly expanded input sequence 
of A is initially labeled with T and the labels are refined using GLIs computed 
from proofs of spuriousness. The following lemma describes the refinement of 
the labeling of an input sequence by a non-local GLI: 
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Lemma 7. Let U be an unfolding of an automaton A = (X, X,Q,t, F, A) such 
that a = a1 ...an E€ dom(U) and (Io,..., In) is a GLI for a. Then the mapping 
U' : dom(U) —> Form*(Q,0) is an unfolding of A, where: 


- U'(ax) = U(ax) A Jy, for all k € [n], where Jy, is the formula obtained from 
Iy by removing the time stamp of each predicate symbol q™ and existentially 
quantifying each free variable, and 


- U'(B) = U(8) if B € dom(U) and 8 Z a, 
Moreover, a is safe in U’. 


Observe that, by Lemma6(2), the set of free variables of a GLI formula J; 
consists of (i) variables X‘S" keeping track of data values seen in the input 
at some earlier moment in time, and (ii) variables that track past choices made 
within the transition rules. Basically, it is not important when exactly in the past 
a certain input has been read or when a choice has been made, because only 
the relation between the values of these and the current variables determines 
the future behavior of the automaton. Quantifying these variables existentially 
does the job of ignoring when exactly in the past these values have been seen. 
Moreover, the last point of Lemma 7 ensures that the refined path is safe in the 
new unfolding and will stay safe in all future refinements of this unfolding. 

The last ingredient of the lazy annotation semi-algorithm based on unfold- 
ings consist in the implementation of the coverage check, when the unfolding of 
an automaton is labeled with conjunctions of existentially quantified formulae 
with predicate symbols, obtained from interpolation. By Definition 3, checking 
whether a given node a € dom(U) is covered amounts to finding a prefix a’ < a 
and a node 8 € dom(U) such that U(a’) H} U(8), or equivalently, the formula 
U(a’) \7U (8) is unsatisfiable. However, the latter formula, in prenex form, has 
quantifier prefix in the language 4*V* and, as previously mentioned, the satis- 
fiability problem for such formulae becomes undecidable when the data theory 
subsumes Presburger arithmetic [10]. 

Nevertheless, if we require just a yes/no answer (i.e. not an interpolant) 
recently developed quantifier instantiation heuristics [25] perform rather well 
in answering a large number of queries in this class. Observe, moreover, that 
coverage does not need to rely on a complete decision procedure. If the prover 
fails in answering the above satisfiability query, then the semi-algorithm assumes 
that the node is not covered and continues exploring its successors. Failure to 
compute complete coverage may lead to divergence (non-termination) and ulti- 
mately, to failure to prove emptiness, but does not affect the soundness of the 
semi-algorithm (real counterexamples will still be found). 


5 Experimental Results 


We have implemented a version of the IMPACT semi-algorithm [20] in a pro- 
totype tool, avaliable online [8]. The tool is written in Java and uses the Z3 
SMT solver [27], via the JavaSMT interface [15], for spuriousness and coverage 
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queries and also for interpolant generation. Table 1 reports the size of the input 
automaton in bytes, the numbers of Predicates, Variables and Transitions, the 
result of emptiness check, the number of Expanded and Visited Nodes during 
the unfolding and the Time in miliseconds. The experiments were carried out on 
a MacOS x64 - 1.3 GHz Intel Core i5 - 8GB 1867 MHz LPDDR3 machine. 

The test cases shown in Table 1, come from several sources, namely pred- 
icate automata models (*.pa) [6,7] available online [23], timed automata 
inclusion problems (abp.ada, train.ada, rr-crossing.foada), array logic 
entailments (array_rotation.ada, array_simple.ada, array_shift.ada) and 
hardware circuit verification (hw1.ada, hw2.ada), initially considered in [13], 
with the restriction that local variables are made visible in the input. The 
train-simpleN. foada and fischer-mutexN. foada examples are parametric 
verification problems in which one checks inclusions of the form NÈ L(Ai) € 
£(B), where A; is the i-th copy of the template automaton. 

The advantage of using FOADA over the INCLUDER [12] tool from [13] is the 
possibility of having automata over infinite alphabets with local variables, whose 
values are not visible in the input. In particular, this is essential for checking 
inclusion of timed automata that use internal clocks to control the computation. 


6 Conclusions 


We present first-order alternating automata, a model of computation that gener- 
alizes classical boolean alternating automata to first-order theories. Due to their 
expressivity, first-order alternating automata are closed under union, intersec- 
tion and complement. However the emptiness problem is undecidable even in 
the most simple case, of the quantifier-free theory of equality with uninterpreted 
predicate symbols. We deal with the emptiness problem by developping a prac- 
tical semi-algorithm that always terminates, when the automaton is not empty. 
In case of emptiness, termination of the semi-algorithm occurs in most practical 
test cases, as shown by a number of experiments. 
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Abstract. We present the first stable release of our tool Q3B for decid- 
ing satisfiability of quantified bit-vector formulas. Unlike other state-of- 
the-art solvers for this problem, Q3B is based on translation of a formula 
to a BDD that represents models of the formula. The tool also employs 
advanced formula simplifications and approximations by effective bit- 
width reduction and by abstraction of bit-vector operations. The paper 
focuses on the architecture and implementation aspects of the tool, and 
provides a brief experimental comparison with its competitors. 


1 Introduction 


Advances in solving formula satisfiability modulo theories (SMT) achieved during 
the last few decades enabled significant progress and practical applications in the 
area of automated analysis, testing, and verification of various systems. In the 
case of software and hardware systems, the most relevant theory is the theory 
of fixed-sized bit-vectors, as these systems work with inputs expressed as bit- 
vectors (i.e., sequences of bits) and perform bitwise and arithmetic operations 
on bit-vectors. The quantifier-free fragment of this theory is supported by many 
general-purpose SMT solvers, such as CVC4 [1], MathSAT [7], Yices [10], or Z3 [9] 
and also by several dedicated solvers, such as Boolector [21] or STP [12]. How- 
ever, there are some use-cases where quantifier-free formulas are not natural or 
expressive enough. For example, formulas containing quantifiers arise naturally 
when expressing loop invariants, ranking functions, loop summaries, or when 
checking equivalence of two symbolically described sets of states [8, 13, 17,18, 24]. 
In the following, we focus on SMT solvers for quantified bit-vector formulas. In 
particular, this paper describes the state-of-the-art SMT solver Q3B including its 
implementation and the inner workings. 

Solving of quantified bit-vector formulas was first supported by Z3 in 2013 [25] 
and for a limited set of exists/forall formulas with only a single quantifier alter- 
nation by Yices in 2015 [11]. Both of these solvers decide quantified formulas by 
quantifier instantiation, in which universally quantified variables in the Skolem- 
ized formula are repeatedly instantiated by ground terms until the resulting 
quantifier-free formula is unsatisfiable or a model of the original formula is found. 
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In 2016, we proposed a different approach for solving quantified bit-vector for- 
mulas: by using binary decision diagrams (BDDs) and approximations [14]. For 
evaluation of this approach, we implemented an experimental SMT solver called 
Q3B, which outperformed both Z3 and Yices. Next solver that was able to solve 
quantified bit-vector formulas was Boolector in 2017, using also an approach 
based on quantifier instantiation [22]. Unlike Z3, in which the universally quan- 
tified variables are instantiated only by constants or subterms of the original 
formula, Boolector uses a counterexample-guided synthesis approach, in which a 
suitable ground term for instantiation is synthesized based on the defined gram- 
mar. Thanks to this, Boolector was able to outperform Q3B and Z3 on certain 
classes of formulas. More recently, in 2018, support of quantified bit-vector for- 
mulas has also been implemented into CVC4 [20]. The approach of CVC4 is 
also based on quantifier instantiation, but instead of synthesizing terms given by 
the grammar as Boolector, CVC4 uses predetermined rules based on invertibility 
conditions, which directly give terms that can prune many spurious models with- 
out using potentially expensive counterexample-guided synthesis. The authors 
of CVC4 have shown that this approach outperforms Z3, CVC4, and the original 
Q3B. However, Q3B has been substantially improved since the original exper- 
imental version. In 2017, we extended it with simplifications of quantified bit- 
vector formulas using unconstrained variables [15]. Further, in 2018, we added 
the experimental implementation of abstractions of bit-vector operations [16]. 
With these techniques, Q3B is able to decide more formulas than Z3, Boolector, 
and CVC4. Besides the theoretical improvements, Q3B was also improved in 
terms of stability, ease of use, technical parts of the implementation, and com- 
pliance with the SMT-LIB standard. This tool paper presents the result of these 
improvements: Q3B 1.0, the first stable version of Q3B. 

We briefly summarize the SMT solving approach of Q3B. As in most of mod- 
ern SMT solvers, the input formula is first simplified using satisfiability-preserving 
transformations that may reduce the size and complexity of the formula. The sim- 
plified formula is then converted to a binary decision diagram (BDD) that represents 
all assignments satisfying the formula, i.e., the models of the formula. If the BDD 
represents at least one model, we say that the BDD is satisfiable and it implies satis- 
fiability of the formula. If the BDD represents the empty set of models, we say that 
it is unsatisfiable and so is the formula. Unfortunately, there are formulas for which 
the corresponding BDD (or some of the intermediate BDDs that appear during its 
computation) is necessarily exponential in the number of bits in the formula. For 
example, this is the case for formulas that contain multiplication of two bit-vector 
variables [5]. To be able to deal with such formulas, Q3B computes in parallel also 
BDDs underapproximating and overapproximating the original set of models, i.e., 
BDDs representing subsets and supersets of the original set of models, respectively. 
The approximating BDDs may be much smaller in size than the precise BDD, espe- 
cially if the approximation is very rough. Still, they can be used to decide satisfi- 
ability of the original formula. If an overapproximating BDD is unsatisfiable, the 
original formula is also unsatisfiable. If the overapproximating BDD is satisfiable, 
we take one of its models, i.e., an assignment to the top-level existential variables of 
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the formula, and check whether it is a model of the original formula. If the answer is 
positive, the original formula is satisfiable. In the other case, we build a more pre- 
cise overapproximating BDD. Underapproximating BDDs are utilized analogously. 
The only difference is that for unsatisfiable underapproximating BDD, we check the 
validity of a countermodel, i.e., an assignment to the top-level universal variables 
that makes the formula unsatisfiable. The approach is depicted in Fig. 1. 
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Fig. 1. High-level overview of the SMT solving approach used by Q3B. The three shaded 
areas are executed in parallel and the first result is returned. 


Q3B currently supports two ways of computing the approximating BDDs from 
the input formula. First of these are variable bit-width approximations in which the 
effective bit-width of some variables is reduced. In other words, some of the vari- 
ables are represented by fewer bits and the rest of the bits is set to zero bits, one 
bits, or the sign bit of the reduced variable. This approach was originally used by the 
SMT solvers UCLID [6] and Boolector [21]. Q3B extends this approach to quantified 
formulas: if bit-widths of only existentially quantified variables are reduced, the 
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resulting BDD is underapproximating; if bit-widths of only universally quantified 
variables are reduced, the resulting BDD is overapproximating. The second way to 
obtain an approximation is bit-vector operation abstraction [16], during which the 
individual bit-vector operations may not compute all bits of the result, but produce 
some do-not-know bits if the resulting BDDs would exceed a given number of nodes. 
An underapproximating BDD then represents assignments that satisfy the formula 
for all possible values of these do-not-know bits. Analogously, an overapproximat- 
ing BDD represents all assignments that satisfy the formula for some value of the 
do-not-know bits. Q3B also supports a combination of these two methods, in which 
both the effective bit-with of variables is reduced and the limit on the size of BDDs 
is imposed. During an approximation refinement, either the effective bit-width or 
the size limit is increased, based on the detected cause of the imprecision. 
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Fig. 2. Architecture of Q3B. Components in the shaded box are parts of Q3B, the 
other components are external. 


2 Architecture 


This section describes the internal architecture of Q3B. The overall structure 
including internal and external components and the interactions between them 
is depicted in Fig. 2. We explain the purpose of the internal components: 


SMT-LIB Interpreter (implemented in SMILIBInterpreter.cpp) reads the 
input file in the SMT-LIB format [3], which is the standard input format for 
SMT solvers. The interpreter executes all the commands from the file. In 
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particular, it maintains the assertion stack and the options set by the user, 
calls solver when check-sat command is issued, and queries Solver if the 
user requires the model with the command get-model. 

Formula Simplifier (implemented in FormulaSimplifier.cpp) provides inter- 
face for all applied formula simplifications, in particular miniscoping, conver- 
sion to negation normal form, pure literal elimination, equality propagation, 
constructive equality resolution (CER) [14], destructive equality resolution 
(DER) [25], simple theory-related rewriting, and simplifications using uncon- 
strained variables. Most of these simplifications are implemented directly in 
this component; only CER, DER, and majority of the theory-related rewritings 
are performed by calling Z3 API and simplifications using unconstrained vari- 
ables are implemented in a separate component of Q3B. The simplifier also 
converts top-level existential variables to uninterpreted constants, so their 
values are also included in a model. Some simplifications that could change 
models of the formula are disabled if the user enables model generation, i.e., 
sets :produce-models to true. 

Unconstrained Variable Simplifier (implemented in UnconstrainedVari- 
ableSimplifier.cpp) provides simplifications of formulas that contain 
unconstrained variables, i.e., variables that occur only once in the formula. 
Besides previously published unconstrained variable simplifications [15], 
which were present in the previous versions of Q3B, this component now 
also provides new goal-directed simplifications of formulas with unconstrained 
variables. In these simplifications, we aim to determine whether a subterm 
containing an unconstrained variable should be minimized, maximized, sign 
minimized, or sign maximized in order to satisfy the formula. If the subterm 
should be minimized and contains an unconstrained variable, the term is 
replaced by a simpler term that gives the minimal result that can be achieved 
by any value of the unconstrained variable. Similarly for maximization, sign 
minimization, and sign maximization. 

Solver (implemented in Solver . cpp) is the central component of our tool. It calls 
formula simplifier and then creates three threads for the precise solver, the 
underapproximating solver, and the overapproximating solver. It also controls 
the approximation refinement loops of the approximating solvers. Finally, it 
returns the result of the fastest thread and stores the respective model, if the 
result was sat. 

Formula to BDD Transformer (implemented in the file ExprToBDDTrans- 
former.cpp) performs the actual conversion of a formula to a BDD. Each 
subterm of the input formula is converted to a vector of BDDs (if the sub- 
term’s sort is a bit-vector of width n then the constructed vector contains 
n BDDs, each BDD represents one bit of the subterm). Further, each subfor- 
mula of the input formula is converted to a BDD. These conversions proceed by 
a straightforward bottom-up recursion on the formula syntax tree. The trans- 
former component calls an external library to compute the effect of logical 
and bit-vector operations on BDDs and vectors of BDDs, respectively. Besides 
the precise conversion, the transformer can also construct overapproximat- 
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ing and underapproximating BDDs. Precision of approximations depends on 
parameters set by the solver component. 

Cache (implemented as a part of ExprToBDDTransformer.cpp) maintains for 
each converted subformula and subterm the corresponding BDD or a vector 
of BDDs, respectively. Each of the three solvers has its own cache. When an 
approximating solver increases precision of the approximation, entries of its 
cache that can be affected by the precision change are invalidated. All the 
caches are internally implemented by hash-tables. 


3 Implementation 


Q3B is implemented in C++17, is open-source and available under MIT license 
on GitHub: https://github.com/martinjonas/Q3B. The project development 
process includes continuous integration and automatic regression tests. 

Q3B relies on several external libraries and tools. For representation and 
manipulation with BDDs, Q3B uses the open-source library CUDD 3.0 [23]. 
Since CUDD does not support bit-vector operations, we use the library by Peter 
Navratil [19] that implements bit-vector operations on top of CUDD. The algo- 
rithms in this library are inspired by the ones in the BDD library BuDDy! and 
they provide a decent performance. Nevertheless, we have further improved its 
performance by several modifications. In particular, we added a specific code for 
handling expensive operations like bit-vector multiplication and division when 
arguments contain constant BDDs. This for example considerably speeds up mul- 
tiplication whenever one argument contains many constant zero bits, which is a 
frequent case when we use the variable bit-width approximation fixing some bits 
to zero. Further, we have fixed few incorrectly implemented bit-vector operations 
in the original library. Finally, we have extended the library with the support 
for do-not-know bits in inputs of the bit-vector operations and we have imple- 
mented abstract versions of arithmetic operations that can produce do-not-know 
bits when the result exceeds a given number of BDD nodes. 

For parsing the input formulas in SMT-LIB format, Q3B uses ANTLR parser 
generated from the grammar? for SMT-LIB 2.6 [2]. We have modified the gram- 
mar to correctly handle bit-vector numerals and to support push and pop com- 
mands without numerical argument. The parser allows Q3B to support all bit- 
vector operations and almost all SMT-LIB commands except get-assertions, 
get-assignment, get-proof, get-unsat-assumptions, get-unsat-core, and 
all the commands that work with algebraic data-types. This is in sharp contrast 
with the previous experimental versions of Q3B, which only collected all the 
assertions from the input file and performed the satisfiability check regardless 
of the rest of the commands and of the presence of the check-sat command. 
The reason for this was that the older versions parsed the input file using the 
Z3 C++ API, which can provide only the list of assertions, not the rest of the 
SMT-LIB script. Thanks to the new parser, Q3B 1.0 can also provide the user 


1 https: //sourceforge.net /projects/buddy /. 
? https: //github.com/julianthome/smtlibv2-grammar. 
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with a model of a satisfiable formula after calling get-model; this important 
aspect of other SMT solvers was completely missing in the previous versions. 

On the other hand, C++ API of the solver Z3 is still used for internal repre- 
sentation of parsed formulas. The Z3 C++ API is also used to perform manipu- 
lations with formulas, such as substitution of values for variables, and some of 
the formula simplifications. Note that these are the only uses of Z3 API in Q3B 
during solving the formula; no actual SMT- or SAT-solving capabilities of Z3 are 
used during the solving process. 

Some classes of Q3B, in particular Solver, FormulaSimplifier, and 
UnconstrainedVariableSimplifier, expose a public C++ API that can be 
used by external tools for SMT solving or just performing formula simplifications. 
For example, Solver exposes method Solve(formula, approximationType), 
which can be used to decide satisfiability by the precise solver, the underapproxi- 
mating solver, or the overapproximating solver. Solver also exposes the method 
SolveParallel (formula), which simplifies the input formula and runs all three 
of these solvers in parallel and returns the first result as depicted in Fig. 1. 


4 Experimental Evaluation 


We have evaluated the performance of QB3 1.0 and compared it to the lat- 
est versions of SMT solvers Boolector (v3.0), CVC4 (v1.6), and Z3 (v4.8.4). All 
tools were used with their default settings except for CVC4, where we used the 
same settings as in the paper that introduces quantified bit-vector solving in 
CVC4 [20], since they give better results than the default CVC4 settings. As 
the benchmark set, we have used all 5751 quantified bit-vector formulas from 
the SMT-LIB repository. The benchmarks are divided into 8 distinct families of 
formulas. We have executed each solver on each benchmark with CPU time limit 
20 min and RAM limit of 8 GiB. All the experiments were performed in a Ubuntu 
16.04 virtual machine within a computer equipped with Intel(R) Core(TM) 
i7-8700 CPU @ 3.20 GHz CPU and 32 GiB of RAM. For reliable benchmarking 
we employed BENCHEXEC [4], a tool that allocates specified resources for a pro- 
gram execution and precisely measures their usage. All scripts used for running 
benchmarks and processing their results, together with detailed descriptions and 
some additional results not presented in the paper, are available online®. 
Table 1 shows the numbers of benchmarks in each benchmark family solved 
by the individual solvers. Q3B is able to solve the most benchmarks in 
benchmark families 2017-Preiner-scholl-smt08, 2017-Preiner-tptp, 2017-Preiner- 
UltimateAutomizer, 2018-Preiner-cav18, and wintersteiger, and it is competitive 
in the remaining families. In total, Q3B also solves more formulas than each of 
the other solvers: 116 more than Boolector, 83 more than CVC4, and 139 more 
than Z3. Although the numbers of solved formulas for the solvers seem fairly 
similar, the cross-comparison in Table2 shows that the differences among the 
individual solvers are actually larger. For each other solver, there are at least 


3 https: //github.com/martinjonas/q3b-artifact. 
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Table 1. For each solver and benchmark family, the table shows the number of bench- 
marks from the given family solved by the given solver. The column Total shows the 
total number of benchmarks in the given family. The last line provides the total CPU 
times for the benchmarks solved by all four solvers. 


Family Total Boolector CVC4 Q3B 23 
2017-Preiner-keymaera 4035 4022 3998 4009 4031 
2017-Preiner-psyco 194 193 190 182 194 
2017-Preiner-scholl-smt08 374 312 248 319 272 
2017-Preiner-tptp 73 69 73 73 73 
2017-Preiner-UltimateAutomizer 153 152 151 153 153 
20170501-Heizmann-UltimateAutomizer 131 30 128 124 32 
2018-Preiner-cav18 600 553 565 565 553 
wintersteiger 191 163 174 185 163 
Total 5751 5494 5527 5610 5471 


CPU time [s] 7794 5877 19853 4055 


Table 2. For all pairs of the solvers, the table shows the number of benchmarks 
that were solved by the solver in the corresponding row, but not by the solver in the 
corresponding column. The column Uniquely solved shows the number of benchmarks 
that were solved only by the given solver. 


Boolector CVC4 Q3B Z3|Uniquely solved 


Boolector 0 123 69 78 8 
CVC4 156 0 60171 6 
Q3B 185 143 0 208 25 


Z3 55 115 69 0 6 


143 benchmarks that can be solved by Q3B but not by the other solver. We 
think this shows the importance of developing an SMT solver based on BDDs and 
approximations besides the solvers based on quantifier instantiation. 


5 Conclusions and Future Work 


We have described the architecture and inner workings of the first stable version 
of the state-of-the-art SMT solver Q3B. Experimental evaluation on all quanti- 
fied bit-vector formulas from SMT-LIB repository shows that this solver slightly 
outperforms other state-of-the-art solvers for such formulas. 

As future work, we would like to drop the dependency on the Z3 API: namely 
to implement our own representation of formulas and reimplement all the sim- 
plifications currently outsourced to Z3 API directly in Q3B. We also plan to 
extend some simplifications with an additional bookkeeping needed to construct 
a model of the original formula. With these extensions, all simplifications could 
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be used even if the user wants to get a model of the formula. We would also like 
to implement production of unsatisfiable cores since they are also valuable for 
software verification. 
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Abstract. We present CVC4SY, a syntax-guided synthesis (SyGuS) solver based 
on three bounded term enumeration strategies. The first encodes term enumer- 
ation as an extension of the quantifier-free theory of algebraic datatypes. The 
second is based on a highly optimized brute-force algorithm. The third combines 
elements of the others. Our implementation of the strategies within the satisfiabil- 
ity modulo theories (SMT) solver CVC4 and a heuristic to choose between them 
leads to significant improvements over state-of-the-art SyGuS solvers. 


1 Introduction 


Syntax-guided synthesis (SyGuS) [3] is a recent paradigm for program synthesis, suc- 
cessfully used for applications in formal verification and programming languages. Most 
SyGuS solvers perform counterexample-guided inductive synthesis (CEGIS) [16]: a 
refinement loop in which a learner proposes solutions, and a verifier, generally a satisfi- 
ability modulo theories (SMT) solver [8,9], checks them and provides counterexamples 
for failures. Generally, the learner enumerates some set of terms, while pruning spuri- 
ous ones [17]. The simplicity and efficacy of enumerative SyGuS have made it the de 
facto approach for SyGuS, although alternatives exist for restricted fragments [4, 14]. 

In previous work [14], we have shown how the SMT solver CvC4 [5] can itself act as 
an efficient synthesizer. This tool paper focuses on recent advances in the enumerative 
subsolver of CvC4, culminating in the current SyGuS solver CvC4SyY. Figure 1 shows 
its main components. The term enumerator is parameterized by an enumeration strategy 
chosen before solving: CVC4SY_S, whose constraint-based (smart) enumeration allows 
for numerous optimizations (Sect. 2); CVC4SY_F, based on a new approach for (fast) 
enumerative synthesis (Sect. 3), which has significant advantages with respect to the 
enumerative solver CVC4SY-S and other state-of-the-art approaches; and CVC4SY_H, 
based on a hybrid approach combining smart and fast enumeration (Sect. 4). All strate- 
gies are fully integrated in CVC4, meaning they support inputs in many background 
theories, including arithmetic, bit-vectors, strings, and floating point. We evaluate these 
approaches on a large set of benchmarks (Sect. 5). 


© The Author(s) 2019 
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Fig. 1. Architecture of cvc4sy. 


The Problem. A syntax-guided synthesis problem for a function f in a background 
theory T consists of a set of semantic restrictions, or specification, for f given by a 
(second-order) T-formula of the form 3f. [f], and a set of syntactic restrictions on 
the solutions for f, typically expressed as a context-free grammar. An enumerative 
approach to this problem combines a term enumerator and a solution verifier for solving 
synthesis conjectures. The role of the term enumerator is to output a stream of terms 
tı, t2,... over some tuple Z of variables representing the inputs of f, where each ¢; |Z] 
is a candidate solution. The role of the solution verifier is to check for each t; whether it 
is a solution for f by determining if the negated conjecture —y[AZ.t;] is unsatisfiable. 

Bounded term generation considers terms based on an ordering such as term size 
(the number of non-nullary symbols in a term). For each k = 0,1,2,..., the term enu- 
merator outputs a finite set S;, of terms, each of size at most k. Bounded term generation 
in CVC4SY is complete in the sense that, for any k, if f has a solution of size at most k, 
then at least one of the terms in S% is a solution for f. The effectiveness of an approach 
for (complete) bounded term generation can be evaluated based on two criteria: (i) the 
number of terms it generates and (ii) the rate at which it generates them. 

We follow two approaches for enumerative SyGuS in CVC4SY, each optimized for 
one of the criteria above: a smart approach and a fast one. The first aims to generate 
reasonably quickly the smallest set of terms while maintaining completeness, while the 
second aims to generate terms as quickly as possible. 


Technical Preliminaries. As we showed in previous work [14], syntactic restrictions 
can be conveniently represented as a set of (algebraic) datatypes, for which some SMT 
solvers have dedicated decision procedures [7,13]. For instance, given a function f : 
(x : Int) x (y : Int) — Int and the context-free grammar R below specifying what 
integer (J) and Boolean (B) terms can appear in candidate solutions for f: 


O|1|a|y |I+I| I-1 | ite(B,I,1) (1) 
BeB|IxI|-B| BAB (2) 
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our SyGuS solver generates the following mutually recursive datatypes: 


T=0|]1|x | y | plus(Z,Z) | minus(Z,Z) | ite(6,Z,Z) (3) 
B= geq(Z,T) | eq(Z,Z) | not(B) | and(B, B) (4) 


Each datatype constructor corresponds to a production rule of R, e.g. plus corresponds 
to the rule J ::= I + I. A datatype term such as plus(x, y) represents the arithmetic term 
x + y. We will use these datatypes as a running example. 

For a datatype term t, we write isc(t) to denote the discriminator predicate that 
is satisfied exactly when t is interpreted as a datatype whose top constructor is C. We 
write sel? (t) to denote a shared selector [15] applied to t, interpreted as the n‘ child 
of t with type 7 if one exists, and interpreted as an arbitrary element of 7 otherwise. 
A term consisting of zero or more consecutive nested applications of shared selectors 
applied to a term t is a shared selector chain (for t). 


2 Smart Enumerative SyGuS 


Our smart enumerative SyGuS approach CVC4SY-S, is based on finding solutions for an 
evolving set of constraints in an extension of the quantifier-free fragment of algebraic 
datatypes. These constraints are constructed to rule out many redundant solutions while 
not overconstraining the problem, potentially missing actual solutions. 

In detail, candidate solutions for the function f : 7; — 72 to be synthesized are con- 
structed by maintaining a set of constraints F’, initially empty, for a first-order variable 
d ranging over the datatype representing 72. For example, consider again the function 
f with the syntactic restrictions expressed by the datatypes in Eqs. 3 and 4. If the term 
generator finds a model for F’, it provides to the solution verifier the integer term which 
corresponds to the value of d in the model; for example, it provides x + 1 when d is 
interpreted as plus(x, 1). In turn, if the solution verifier finds that x + 1 is not a solution, 
it provides the blocking constraint —iSpius(d) v —isx(self (d)) v —is1(sel3 (d)), i.e., the 
datatype constraint that rules out the current value for d, which is then added to F. This 
is a syntactic constraint on future candidate solutions from the term generator. Its atoms 
are discriminators applied to shared selector chains. 

CVC4SY_S uses a number of optimization techniques in addition to the basic loop 
above, which we describe in the remainder of this section. These techniques produce 
blocking constraints via the lemmas-on-demand paradigm [6] that eagerly rule out spu- 
rious candidates, prior to the solution verification step. Additionally, whenever possible, 
it strengthens blocking constraints via novel generalization techniques, with the effect 
of ruling out larger classes of candidates. 


Blocking via Theory Rewriting with Structural Generalization. As we describe in pre- 
vious work [14], the enumerative solver of CVC4 uses its rewriter as an oracle for dis- 
covering when candidate solutions are redundant. The motivation is that for any two 
equivalent terms ¢ and s, only one of them needs to be checked with the solution veri- 
fier, since either both ¢ and s are solutions to the synthesis conjecture or neither is. Given 
a term t, we write t} to denote its rewritten form. Note that it is possible for equivalent 
terms not to have the same rewritten form. This is a consequence of the trade-offs in the 
implementation of CVC4’s rewriter, which must balance efficiency and completeness. 
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As an example, suppose that the term enumerator previously generated x+y and that 
d’s current value is the datatype term representing y + x, where, however, (x + y)| = 
(y + «)|. We first generate a blocking constraint template R[z] of the form —ispiys(z) v 
wis, (self (z)) v sis,(sel5 (z)), where z is a fresh variable. This template is subsequently 
instantiated with z +> u for any shared selector chain u of type Z that currently (or 
later) appears in F’, starting with d itself. This has the effect of ruling out all candidate 
solutions that have y + x as a subterm, which is justified by the fact that each such term 
is equivalent to one in which all occurrences of y + x are replaced by x + y. 

We employ a refinement of this technique, which we call theory rewriting with 
structural generalization, which searches for and then blocks only the minimal skeleton 
of the term under test that is sufficient for determining its rewritten form. For example, 
consider the if-then-else term t = ite(x ~ 0 ^ y > 0, 0, x), This term is equivalent to 
x, regardless of the value of predicate y > 0. This can be confirmed by the rewriter by 
computing that ite(2 ~ 0 A w, 0, x)| = x where w is a fresh Boolean variable. Then, 
instead of generating a constraint that blocks only (the datatype value corresponding 
to) t, we generate a stronger constraint that does not depend on the subterm y > 0. In 
other words, this blocking constraint rules out all candidate solutions that contain the 
subterm ite(x ~ OA w, 0, x), for any term w. We compute these generalizations using a 
recursive algorithm that iteratively replaces each subterm of the current candidate with 
a fresh variable, and checks whether its rewritten form remains the same. 


Blocking via CEGIS with Structural Generalization. Synthesis solvers based on CEGIS 
maintain a list of refinement points that witness the infeasibility of previous candidate 
solutions. That is, given a synthesis conjecture If. VZ. |f, z], the solver maintains a 
growing list p1, ...,Ðn of values for z that witness the infeasibility of previous can- 
didates u,,..., Un for f. Then, when a new candidate u is generated, we first check 
whether y|u, p;] is false for some i < n. When a candidate u fails to satisfy y[u, p;], 
Cvc4sy_S further applies a form of generalization analogous to the structural general- 
ization described above. We call this CEGIS with structural generalization, where the 
goal is to find the minimal skeleton of u that also fails to satisfy some refinement point. 
For example, suppose f is the function to synthesize, y includes the constraint 
f(x,y) < «—1, and pı = (3,3) is a refinement point. Then, the candidate term 
ulx, y] = ite(x > 0, x, y+ 1) will be discarded, because ite(3 > 0, 3, 4) 2. Notice, 
however, that any candidate u’ = ite(a > 0, x, w) is falsified by pı, regardless of what 
w is, since u’[3,3] < 2 is equivalent to 3 < 2. This indicates that we can block ail ite 
candidate terms with condition x > 0 and true branch x. We can express this constraint 
in CVC4SY_S by dropping the disjuncts that relate to the false branch of the ite term. 
This form of blocking is particularly useful when synthesizing multiple functions (f1, 
.,; fn), Since it is often the case that a candidate for a single f; is already sufficient to 
falsify the specification, regardless of what the candidates for the other functions are. 


Evaluation Unfolding. This technique uses evaluation functions to encode the rela- 
tionship between the datatype terms assigned to d and their analogs in the theory T. 
For example, the evaluation function for the datatype Z defined in (3) is a function 
Ez : T x Int x Int > Int defined axiomatically so that Ez(d, m, n) denotes the result of 
evaluating d by interpreting any occurrences of x and y in d respectively as m and n and 
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interpreting the other constructors as the corresponding arithmetic/Boolean operators, 
e.g. Ez(minus(x, y), 5, 3) is interpreted as 2. When a refinement point ¢ is generated, we 
add a constraint requiring that the evaluation of d at č must satisfy the specification. For 
example, for conjecture 3 f. Yx. f(a + 1,2) < 0, and refinement point x > 1, we add 
the constraint Ez(d,2, 1) < 0. Then, when a literal isc (t) is asserted for a term t of type 
T, we can add a constraint corresponding to the one-step unfolding of the evaluation of 
t. Specifically, when iSite(d) is asserted, we generate the constraint 


isite(d) > Ez(d, 2, 1) ~ ite(Eg(sel? (d), 2, 1), Ez (self (d), 2, 1), Ex(sel5 (d), 2, 1)) 


indicating that the evaluation of d on point (2, 1) indeed behaves like an ite term when 
d has top symbol ite. Our implementation adds these constraints for all terms t whose 
top symbols correspond to ite or Boolean connectives. For terms t whose top symbol is 
any of the other operators, we add constraints corresponding to their total evaluation of t 
when the value of t is fully determined, for example, t ~ plus(x, y) = Ez(t,2,1) ~ 3. 
Notice this constraint with t = d along with the refinement constraint Ez(d, 2,1) < 0 
suffices to show that d cannot be plus(x, y). 


3 Fast Enumerative SyGuS 


The techniques in the previous section prune the search space so that often, only a small 
subset of the entire possible set of terms is considered for a given term size bound. 
The main bottleneck, however, is managing the large number of blocking constraints 
generated. Moreover, the benefits of this approach are limited when the grammar or 
specification does not admit opportunities for generalization. 

For this reason, we have also developed CVC4SY-F, which, in the spirit of other 
SyGuS solvers (notably ESOLVER [17]), relies on a principled brute-force approach 
for term generation. In contrast to other solvers, however, which are built as layers on 
top of the core SMT reasoner, CVC4SyY-F is fully integrated as a subsolver of CVC4, 
so communication with other components has almost no overhead. This technique, fast 
enumerative synthesis, does not use constraint solving to generate new terms. As a 
result, the majority of optimizations from Sect. 2 are incompatible with it. 


Algorithm. To generate terms up to a given size k, we maintain a set S* of terms of type 
T and size k for each datatype 7 corresponding to a non-terminal symbol of our input 
grammar R. First, we compute for each such 7 the set C, of its constructor classes, 
an equivalence relation over the constructors of 7 that groups them by their type. For 
example, the constructor classes for Z are {x,y,0, 1}, {plus, minus} and {ite}. Then, 
we use the following procedure for generating all terms of size k for type T: 


FASTENUM(r, k): 
For all: 
— Constructor classes C € C,, whose elements have type 7] X ... X mn >T, 
— Tuple of naturals (k1,...k,,) such that kı +.. .+ kn +ite(n > 0,1,0) = k, 
(a) Run FASTENUM(7;, ki) for each i = 1,...,7, 
(b) Add C(ty,...,t,) to S¥ for all tuples (t1,...,tn) with t; € S** and all 
constructors C € C. 
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The recursive procedure FASTENUM(r, k) populates the set S* of all terms of type 7 
with size k. These sets are cached globally. We incorporate an optimization that only 
adds terms C(t;,...,¢,) to S* whose corresponding terms in the theory T are unique 
up to rewriting. This mimics the effect of blocking via theory rewriting as described in 
Sect. 2. For example, plus(y, x) is not added to $+ if that set already contains plus(x, y), 
noting that (x + y)| = (y + 2){. By construction of S* for k > 1, this has the cascad- 
ing effect of excluding all terms having y + x as a subterm. 

We observe that theory rewriting with structural generalization cannot be easily 
incorporated into this scheme since it requires the use of a constraint solver, something 
that the above algorithm seeks to avoid. 


4 Hybrid Approach: Variable-Agnostic Enumerative SyGuS 


We follow a third approach, in solver CVC4SY-_H, that combines elements of the pre- 
vious approaches. The idea is to use the (smart) approach from Sect.2 to generate 
terms, but then generate multiple candidate solutions from each term using a fast sub- 
procedure we call a concretizer. We implement an instance of this scheme, which we 
call variable-agnostic term generation, that produces only terms that are unique mod- 
ulo alpha-equivalence. In our running example, when a term t such as x + 1 is pro- 
duced, the concretizer produces all terms generated by the grammar R that are alpha- 
equivalent to t, namely, {x + 1, y + 1} in this case. The advantage of this approach 
is that CVC4SY_H can block any term whose variables are not canonically ordered; 
that is, assuming for instance that x < y, it may block terms like 1 — y and y + y, 
noting they are alpha-equivalent to 1 — x and x + x, respectively. To implement this 
blocking scheme, we introduce unary Boolean predicates pre, and post, for each 
variable x in our grammar, where pre, (resp., posts) holds for t if and only if vari- 
able x occurs in a depth-first left-to-right traversal of our candidate term before (resp., 
after) traversing to the position indicated by the selector chain t. We encode the seman- 
tics of these predicates based on the arguments of constructors in our signature, e.g. 
iSplus(z) => (prex(z) ~ pre,(sely (z)) A post,(sels(z)) ~ post,(z)). We then assert 
that pre, and pre, are false for our top-level variable d, and require isy(z) = prez(z) 
for all z, stating that x must come before y in the traversal of any generated term. 

This technique is useful for grammars with many variables, such as grammars in 
invariant synthesis problems, where the number of terms of small size is prohibitively 
large. Blocking based on theory rewriting (with generalization) from Sect. 2 is compat- 
ible with this technique and is used in CVC4SY_H. However, the other optimizations are 
disabled, since they prune solutions in a way that is not agnostic to variables. 


5 Evaluation 


We evaluated the above techniques in CVC4SyY on four benchmark sets: invariant syn- 
thesis benchmarks from the verification of Lustre [11] models; a set from work on 
synthesizing invertibility conditions for bit-vector operators [12] (IC-BV); a set of 
bit-vector invariant synthesis problems [2] (CegisT); and the SyGuS-COMP 2018 [1] 
benchmarks from five tracks: assorted problems (General), conditional linear arithmetic 
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Table 1. Summary of number of problems solved per benchmark set. Best results are in bold. 


Set # atsi a S S-cg S-eu S-rg s-r f f-r h h-rg h-r EUS 
General 413 293 237 228 229 232 230 220 237 226 221 225 213 290 
Gen-CrCi 214 159 159 159 159 143 159 159 155 132 130 137 125 152 
CLIA 88 86 20 20 19 19 19 18 20 16 16 16 16 85 
INV 127 109 109 109 109 109 109 109 110 109 109 109 109 68 
PBE-BV 753 751 751 721 721 721 721 628 751 717 721 721 628 745 
PBE-Str 109 105 105 104 104 104 87 75 105 103 102 87 75 74 
Subtotal 1704 1503 1381 1341 1341 1328 1325 1209 1378 1303 1299 1295 1166 1414 
IC-BV 160 135 135 135 132 130 130 133 138 132 128 126 127 
CegisT 7 56 43 43 43 43 42 41 #42 42 #42 «42 4l 
Lustre 485 255 255 255 255 218 211 221 231 213 248 244 234 
Total 2428 1949 1814 1774 1771 1719 1708 1604 1789 1690 1717 1707 1568 


problems (CLIA), invariant synthesis problems (INV), and programming-by-examples 
problems [10] with a set over bit-vectors (PBE-BV) and another over strings (PBE- 
Str). We also considered separately the CrCi subset from General, which corresponds 
to cryptographic circuit synthesis. We ran our experiments on a cluster equipped with 
Intel E5-2637 v4 CPUs running Ubuntu 16.04, providing one core, 1800s, and 8 GB 
RAM for each job. Results are summarized in Table 1 and Fig. 2. We denote the strate- 
gies from Sects. 2, 3, and 4 by s, f and h, respectively (smart, fast, and hybrid); disabling 
the optimizations from Sect. 2 is marked by “-” and the suffixes r (rewriting), rg (rewrit- 
ing with structural generalization), cg (CEGIS with structural generalization), and eu 
(evaluation unfolding). We also evaluated two meta-strategies of CVC4SY: a and a+si. 
The auto strategy a picks a strategy based on the properties of the problem: f for PBE 
problems and for problems without the Boolean type or the ite operator in their gram- 
mar and s otherwise. Strategy a+si uses the single-invocation solver [14] on problems 
that are amenable to quantifier elimination and a otherwise. We use the state-of-the-art 
SyGuS solver EUSOLVER [4] (EUS) as a baseline, but only for SyGuS-COMP bench- 
marks due to limitations in its parser. 

Overall, strategy s excels on more challenging benchmark sets such as Lustre and 
Gen-Crci, while strategy f excels on the majority of the others. The gains for f are 
especially significant on PBE problems, where it outperforms both s and EUS by sev- 
eral orders of magnitude. Such gains are significant given that CvC4 won this track 
at SyGuS-COMP 2018 by employing s alone, and a variant of EUS won it in 2017. 
This result can be explained as a consequence of two factors. First, the string and bit- 
vector grammars contain many operators with the same type, making the constructor 
class optimization of the f algorithm very effective. Second, although not described 
in this paper, all solvers in our evaluation use divide-and-conquer algorithms for PBE 
problems [4], which are not compatible with the optimizations cg and eu. The most 
important optimization for all CVC4SyY strategies and with all benchmark sets is r. The 
optimization eu is especially effective when grammars contain ite and Boolean connec- 
tives, such as those in the Lustre set and in some subsets of General, on which we can 
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Fig. 2. Cactus plot on commonly supported benchmark sets. The first scatter plot is for the Lustre 
set, the second for the Gen-Crci set, and the latter two for the 862 benchmarks from the PBE sets. 


see the biggest gains of s with respect to s-eu; cg is more helpful for IC-BV, with a few 
harder benchmarks only solved due to this technique. 

The first scatter plot in Fig. 2 shows the advantage of h over s on Lustre, a bench- 
mark set containing invariant synthesis problems with dozens of variables. We remark 
this configuration excels at quickly finding small solutions for problems with many vari- 
ables, although solves fewer problems overall. The second scatter plot shows that while 
s takes significantly longer on easy problems, it outperforms f in the long run. The last 
two plots show that f significantly outperforms the state of the art on PBE benchmarks. 

For all benchmark sets, the auto strategy a chooses the best enumerative strategy 
of Cvc4sy with only a few exceptions, and hence it is the default configuration of 
Cvc4sy. Due to specialized synthesis techniques [4, 14], both a+si and EUS outperform 
the purely enumerative strategies of CvC4. This is reflected in the cactus plot on the 
commonly supported benchmark sets, where a and f solve more benchmarks than EUS 
for lower times but then EUS solves more benchmarks in the end. For a+si, the cactus 
plot shows that it outperforms EUS significantly. Nevertheless, we remark that a+si is 
able to solve only 393 (16%) of the overall benchmarks using only single invocation 
techniques. Hence, we conclude that both smart and fast enumerative strategies are 
critical subcomponents in our approach to syntax-guided synthesis. 
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Abstract. Quantifier elimination and its cousin functional synthesis are 
fundamental problems in automated reasoning that could be used in 
many applications of formal methods. But, effective algorithms are still 
elusive. In this paper, we suggest a simple modification to a QBF algo- 
rithm to adapt it for quantifier elimination and functional synthesis. We 
demonstrate that the approach significantly outperforms previous algo- 
rithms for functional synthesis. 


1 Introduction 


Given a Boolean formula JY. p with free variables X, quantifier elimination 
(also called projection) is the problem to find a formula y = JY. y that only 
contains variables X. Closely related, the functional synthesis problem is to find 
a function fy : 2% — B for all y € Y, such that y[Y + f,(X)] = IY. ¢. 

Quantifier elimination and functional synthesis are fundamental operations in 
automated reasoning, computer-aided design, and verification. Hence, progress 
in algorithms for these problems benefits a broad range of applications of for- 
mal methods. For example, typical algorithms for reactive synthesis reduce to 
computing the safe region of a safety game through repeated quantifier elimi- 
nations [1-3] or directly employ functional synthesis [4]. Until today, algorithms 
for quantifier elimination often involve (reduced ordered) Binary Decision Dia- 
grams (BDDs) [5]. However, BDDs often grow exponentially for applications in 
verification, and extracting formulas (or strategies, etc.) from BDDs typically 
results in huge expressions. The search for alternatives resulted in CEGAR-style 
algorithms [6-10]. 

In this work, we take look at the closely related field of QBF solving. There 
pure CEGAR solving [11-13] on the CNF representation is not competitive any- 
more [14], and it has been augmented by preprocessing [15,16], circuit repre- 
sentations [17-21], and Incremental Determinization (ID) [22]. It may hence be 
fruitful to leverage some of the recent developments of QBF. 

The contribution of this work is a simple modification of ID to enable quanti- 
fier elimination and functional synthesis. Incremental Determinization (ID) is an 
algorithm for solving quantified Boolean formulas of the shape YX. JY. y, where 
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y is a propositional formula in conjunctive normal form (CNF), i.e. 2QBF. It 
follows a proof-theoretic approach, very similar to a SAT solver, alternating 
between building a model (i.e. Skolem functions for the existential variables Y) 
and a refutation proof [23]. This allows ID to provide a model (i.e. a Skolem 
function) when it determines that a formula is true, which sets it apart from 
other QBF algorithms. 

The modification of ID to enable quantifier elimination for a given formula 
JY. ọ is very simple: We run ID on the formula as if it was a quantified Boolean 
formula VX.4Y.y, where X are the free variables, but add y to the conflict 
check within ID. This suppresses the UNSAT result in the ID algorithm and it is 
hence forced to terminate with a model (that is, a function), which is guaranteed 
to satisfy the functional synthesis requirements. Quantifier elimination is then 
only a substitution away. 

Our experimental evaluation shows that ID significantly outperforms previ- 
ous algorithms for functional synthesis and quantifier elimination. 

This paper is structured as follows: We review related work in Sect. 2 and 
introduce standard notation in Sect. 3. In Sect. 4 we first review the Incremental 
Determinization algorithm before introducing the change necessary to lift it to 
functional synthesis. The experimental evaluation is in Sect.5. We summarize 
the current state of the tool CADET in Sect. 6 and conclude the paper in Sect. 7. 


2 Related Work 


Functional Synthesis. Early works on functional synthesis tried to exploit Craig 
interpolation, but did not scale well enough [24]. This was followed by first 
attempts to use CEGAR [6], which failed, however, to surpass the performance 
of BDDs [7]. More recent works revisited the use of BDDs, e.g. the tools SSyft [25] 
and RSynth [26,27]. This motivated the search for alternatives to BDDs [8-10]. 
At their core, these new algorithms all rely on counter-example guided abstrac- 
tion refinement (CEGAR) [28], but they apply it in clever, compositional ways. 
However, they still inherit the well-known weaknesses of CEGAR (as, for exam- 
ple, discussed in the QBF literature): For the simple formula y = Ncn ti © Yi, 
where n = |X| =|Y| and zx; € X and y; € Y, CEGAR needs to browse through 
2” satisfying assignments just to recover that the function we were looking for 
is f(a) =a. 

The Back-and-Forth algorithm explores stronger abstraction using MaxSAT 
solvers as a means to reduce the number of assignments that CEGAR needs 
to explore [8]. ParSyn attempts to combat the problem with parallel compute 
power and a compositional approach [9]. This compositional approach has later 
been refined using a wONNF decomposition [10]. 


QBF Certification. Some solvers and preprocessors for QBF have the ability to 
not only provide a yes/no answer, but also produce a certificate (i.e. Skolem func- 
tions) for their result [13,22,29,30]. While most QBF approaches suffer heavy 
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performance penalties when asked to provide a certificate, Incremental Deter- 
minization naturally computes Skolem functions that can be extracted easily 
from the final state [22]. 


3 Preliminaries 


Boolean formulas over a finite set of variables x € X with domain B = {0,1} 
are generated by the following grammar: 


y:=O0|1|2|-~|(y)leveleAy 


Other logical operations, such as implication, XOR, and equality, are considered 
syntactic sugar with the usual definitions. 

An assignment x to a set of variables X is a function æ : X — B that maps 
each variable x € X to either 1 or 0. We denote the space of assignments to 
some set of variables X with 2*. 

Given formulas y and y’, and a variable x, we denote the substitution of x 
by y’ in y as y[a — yy’). We lift substitutions to sets of variables y[X > ts] 
when ¢, maps each x € X to a formula y’. 

A literal | is either a variable x € X, or its negation ~z. We use | to denote 
the literal that is the logical negation of l. A disjunction of literals (l V...V ln) 
is called a clause and their conjunction (l A... Aln) is called a cube. We denote 
the variable of a literal by var(1) and lift the notion to clauses var(l1V-+-Vln) = 
{var(l1),...,var(In)}. 

A formula is in conjunctive normal form (CNF), if it is a conjunction of 
clauses. Throughout this exposition, we assume that the input formula is given 
in CNF. (The output, however, can be a non-CNF formula.) It is trivial to lift the 
approach to general Boolean formulas: Given a Boolean formula y over variables 
X, the Tseitin transformation provides us a formula w with y = JZ., where Z 
are fresh variables [31]. Note that eliminating a group of variables X’ C X in ọ 
is then the same as eliminating X’ U Z in w. 

Resolution is a well-known proof rule that allows us to merge two clauses 
as follows. Given two clauses C1 V v and C2 V ~w, we call Ci @y C2 = C1 V Ca 
their resolvent with pivot v. The resolution rule states that C1 V v and C2 V nu 
imply their resolvent. Resolution is refutationally complete for Boolean formulas 
in CNF, i.e. given a formula in CNF that is equivalent to false, we can derive 
the empty clause using only resolution. 


4 Lifting Incremental Determinization 


In the sequel, we formally define functional synthesis, review the working prin- 
ciple of Incremental Determinization for 2QBF, discuss how the solver state 
corresponds to functions, and then introduce the modification to Incremental 
Determinization to turn it into an algorithm for functional synthesis. The func- 
tional synthesis problem is to find a function fy : 2* — B for all y € Y, such 
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that y[Y + fy(X)] = JY. y. Functional synthesis is closely related to solving 
2QBF: Given a true 2QBF problem YX. JY. y, any Skolem function that is a 
model for the formula is also a solution to the functional synthesis problem for 
variable sets X and Y. Only for false 2QBF there is a difference between the 
problems: if there is an assignment x to X for which there is no assignment to 
Y, the 2QBF cannot be proven with a Skolem function, but the functional syn- 
thesis problem still requires us to produce a function f. It is clear that for input 
x the f can produce any output. We will exploit this similarity between 2QBF 
and functional synthesis in the following to lift the Incremental Determinization 
algorithm to functional synthesis. 


4.1 Working Principle of Incremental Determinization for 2QBF 


ID was originally introduced as an algorithm for 2QBF, the fragment of quanti- 
fied Boolean formulas with at most one quantifier alternation. Given a formula 
VX. JY. p, ID alternates between constructing a model (i.e. a Skolem function) 
to prove the formula correct, and constructing a Q-resolution proof to refute 
the formula [32]. During model construction, ID identifies which variables in 
Y have unique Skolem functions considering the current set of clauses. When 
all variables with unique Skolem functions are identified, ID greedily introduces 
additional clauses to reduce the space of possible Skolem functions, such that the 
remaining variables may get unique Skolem functions, too. Whenever the model 
construction ends up in a dead-end (=conflict), ID switches to constructing a 
refutation proof [32] and derives clauses using resolution. As soon as ID found a 
clause that prevents the model construction from trying the same partial model 
again, it switches back to the model search. Since there are only finitely many 
clauses and models, either the model construction or the refutation proof must 
eventually finish [22,23]. 


Example 1. We will use the following formula as a running example: 


Vx1, £2. Jy1, Y2, Y3- (xı V 7y1) TAN (Gat V yı) A 
(y1 V =y2) A (y1 V 7&2 V y2) A 
(“y1 V y3) A (y2 V =y3) A (£2 V =y3) 


Looking at the first two clauses it is clear that yı is uniquely determined by x1 
and y;’s Skolem function must be fy, (X) = xı. For this step, we intentionally 
ignore all clauses of yı that contain yg and y3, as they do not yet have a Skolem 
function and we have to consider them as undefined. The other clauses containing 
yı will only become relevant when looking for Skolem functions for yg and y3. 

Variables y2 and y3 do not have unique Skolem functions in the formula 
above. ID would now greedily add a decision clause, such as (x2 V ~y2), to also 
make the Skolem function for y2 unique. The added clause, plus clauses 3 and 4 
in the formula define: fy, (X) = fy (X) A x2. 

This results in the situation that there is no Skolem function for y3: For the 
assignment xı — 1, z2 > 0, the functions for yı and y2 assign yi > 1, yo 0. 
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Then clauses 4 and 5 cannot be satisfied both by y3, which means there is a 
conflict for this assignment to the universals. During conflict analysis, ID would 
now resolve clauses 5 and 6 to obtain clause (~y1 V y2), and then backtrack to 
the point before introducing the decision clause. < 


4.2 Representation of Functions 


What is particularly interesting about ID is its ability to produce Skolem func- 
tions when it has proven a formula correct. Other than previous QBF algorithms, 
these Skolem functions are produced without any overhead. 

ID avoids costly representations of Skolem functions: It maintains a set D C 
Y of variables that have a unique Skolem function, and its state includes a 
formula ô characterizing the input-output behavior of the Skolem functions for 
variables D. Formula 6 satisfies VX. 3!D. ô, where 4!D means that there exists 
exactly one assignment to D. We can thus think of ô also as a function fs 
mapping X assignments to D assignments. 


Example 2. Back to our running example. After identifying a unique Skolem 
function for y1, formula 6 consists exactly of the first two clauses of the formula, 
(x1 V7y1) A^ (=z1 V yi). After adding the decision clause and identifying a unique 
Skolem function for y2, 6 consists exactly of the first four clauses and the decision 
clause. < 


4.3 Conflict Checks in ID 


The formulas representing functions have primarily one purpose: to check for the 
existence of conflicts. Whenever we attempt to grow the set D by a variable v, 
we need to check whether v has a unique Skolem function. This check consists 
of two parts; given an arbitrary universal assignment x € 2%, 


(1) is there at most one legal assignment to v, and 
(2) is there at least one legal assignment to v? 


To formally define this, let us consider the clauses (dı V- -- V dn V1) in ¢ that 
contain a literal l of variable v and otherwise only contain literals d; of variables 
in D and X. We call these the clauses with unique consequence, as they can 
be read as implications (~dı A --- A ad, => 1), and we call adi A +--+ A ndn 
the antecedent of that clause. Further, we define A; as the disjunction over all 
antecedents of literal /. (Note that A; depends on D and therefore changes as 
the state of the solver progresses.) 

The two checks from above can now be defined as follows: 


X. AAAy AnA- 
X.N AyA Aw 


(1) 
(2) 


Checking for case (1) can be efficiently approximated [22], but checking for 
case (2) cannot easily be avoided. We thus query a SAT solver with 6A Ay A Aww 
to perform a conflict check. 
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Example 3. We revisit the conflict described in Example 1. The starting point 
is the situation when D = {y1, y2} and 6 consists of the first four clauses of 
the formula as well as the decision clause (#2 V sy2). The antecedents of y3 are 
Ay, = yı and Ay, = 7y2V 722. It is easy to verify that the universal assignment 
zı |> 1, z2 > 0, y1 |> 1, y2 + O satisfies the conflict criterion ô AA, ^Am. < 


4.4 Functional Synthesis 


Remember that in the case of functional synthesis for y over sets of variables X 
and Y, we search for a function f : 2* — 2Y such that f produces a satisfying 
assignment whenever it can, but can produce anything when there is no assign- 
ment to Y satisfying the formula. In case there are satisfying assignments to Y 
for all X, we can simply run ID as if it was a QBF YX. 4. ọ to obtain a Skolem 
function that also satisfies the functional synthesis criterion. In the other case, 
that there is an X for which there is no assignment to Y satisfying y, ID for 
2QBF would eventually detect a conflict that did not depend on a decision and 
return with UNSAT. 

In order to lift ID to functional synthesis, we want to ignore universal assign- 
ments that have no satisfying assignment to Y. A simple way to suppress these 
conflicts is to add y to the conflict check. In order for an assignment to X 
to remain a conflict, we must now additionally find an assignment to Y that 
demonstrates that the conflict could be prevented by a different decision. 

All other parts of ID, including the extraction of functions, remain untouched. 
In particular, termination is still guaranteed, as the greedy model construction 
either results in a function for all variables in Y or in a conflict, upon which at 
least one model is excluded through resolution. 


Example 4. For the conflict in our running example, the universal assignment 
zı |> 1, 22 + 0 is excluded in the modified conflict check. Consider the UNSAT 
core consisting of clauses 2, 5, and 7 for that universal assignment: propagate 
yı => 1 using clause 2; propagate y3 +> 1 using clause 5; and finally propagate 
y3 +> 0 using clause 7. So, instead of going into conflict analysis and backtrack- 
ing, ID for functional synthesis concludes that it has found a function for all 
existential variables and terminates. 


4.5 Quantifier Elimination 


Given a formula JY. y with free variables X, quantifier elimination is the problem 
to find a formula 7 = JY. p over variables X only. Hence, given a solution f to 
the functional synthesis problem for y, we only have to substitute Y by f in y 
to obtain the projected formula. 


5 Experimental Evaluation 


We implemented the modifications to ID in CADET,! a competitive 2QBF 
solver [22]. In this section, we compare CADET experimentally with existing 


' CADET is available at https://github.com/MarkusRabe/cadet. 
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Fig. 1. Log-scale cactus plot comparing the performance over all instances. 


algorithms for functional synthesis. Additionally, we implemented a certificate 
checker for functional synthesis and for quantifier elimination, to make sure that 
the computed functions are correct. The certificate checker only shares the code 
for AIGER circuits and the SAT solver (of which we have tried several), but 
is completely independent otherwise to reduce the chance of correlated bugs. 
The results of CADET have been checked with the proof checker; running times 
reported below are excluding the time to check the certificates. 

So far, there is no standard benchmark for functional synthesis or quantifier 
elimination. Like previous works on functional synthesis, we resort to using the 
2QBF benchmark from QBFEVAL’17 [14], and re-interpret them as functional 
synthesis problems. The 2QBF benchmark from QBFEVAL’17 is a collection of 
384 formulas from various domains, mostly from software verification, program 
synthesis, and logical equivalences [33-36]. 

We compare CADET to the most recent tools on functional synthesis, BaF- 
Syn [8] and BFSS [10], the latter of which has been shown to consistently outper- 
form the earlier, BDD-based tools SSyft [25] and RSynth [26,27]. We ran CADET 
in two configurations: with (CADET-+) and without (CADET) its CEGAR mod- 
ule [23]. We present the results as a cactus plot, which is obtained by running 
each tool on all formulas, sorting the running times for each tool separately. A 
point x, y in this plot means that x formulas were solved in less than time y. 
Note that the time axis is in log-scale (Fig. 1). 

CADET shows a clear edge in performance: it is one to two orders of mag- 
nitude faster than its strongest competitor, BFSS, and can solve significantly 
more formulas. But despite the clear performance advantage in this aggregate 
view, BaFSyn and BFSS can be faster for individual formulas or subfamilies of 
QBFEval, as shown in previous works [8,10]. 
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6 The Current State of CADET 


Originally designed as an experimentation platform, CADET has grown to 
become a performant and versatile tool for the synthesis of Boolean functions. 
It consistently wins awards at the annual QBFEVAL competitions, and is the 
only such tool able to prove all its results [14]. 

CADET reads specifications in the QDIMACS and the QAIGER formats, 
and now supports the synthesis of Boolean functions for 2QBF, functional syn- 
thesis, and quantifier elimination with the command line options -c [file], -f 
[file], and -e [file]. The functions computed by CADET are much smaller 
compared to those found by CEGAR-based algorithms [22], and in its default 
configuration, CADET double-checks its results before reporting them. This can 
be deactivated by the flag --dontverify. 

It has also been integrated in py-aiger [37], a Python package for the conve- 
nient handling of circuits due to Marcell Vazquez-Chanlatte, which enables us 
to easily model and prototype new approaches. For example, we can write: 


import aiger_analysis as aa 

import aigerbv as bv 

x = bv.atom(32, ’x’) # Create a 82 bit variable 
y = bv.atom(32, ’y’) 

expr = (x != y) 

result = aa.eliminate(expr, [’y’]) 

assert aa.is_equal(x, result) 


CADET also has an experimental reinforcement learning interface that allows 
us to automatically learn decision heuristics with the help of graph neural net- 
works. A recent effort shows that there is huge potential in learning better 
branching heuristics from scratch [38]. 


7 Conclusions 


In this work, we extended ID with the ability to solve functional synthesis and 
quantifier elimination problems. The extension is very simple—we only need 
to add the clauses of the original formula to its conflict check. The resulting 
algorithm significantly outperforms previous algorithms for functional synthesis. 


Acknowledgements. The author wants to thank to Shubham Goel, Shetal Shah, and 
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Abstract. This paper presents a technique for computing numerical 
loop summaries. The method synthesizes a rational vector addition sys- 
tem with resets (Q-VASR) that simulates the action of an input loop, and 
then uses the reachability relation of that Q-VASR to over-approximate 
the behavior of the loop. The key technical problem solved in this paper 
is to automatically synthesize a Q-VASR that is a best abstraction of 
a given loop in the sense that (1) it simulates the loop and (2) it is 
simulated by any other Q-VASR that simulates the loop. Since our loop 
summarization scheme is based on computing the exact reachability rela- 
tion of a best abstraction of a loop, we can make theoretical guarantees 
about its behavior. Moreover, we show experimentally that the technique 
is precise and performant in practice. 


1 Introduction 


Modern software verification techniques employ a number of heuristics for rea- 
soning about loops. While these heuristics are often effective, they are unpre- 
dictable. For example, an abstract interpreter may fail to find the most precise 
invariant expressible in the language of its abstract domain due to imprecise 
widening, or a software-model checker might fail to terminate because it gen- 
erates interpolants that are insufficiently general. This paper presents a loop 
summarization technique that is capable of generating loop invariants in an 
expressive and decidable language and provides theoretical guarantees about 
invariant quality. 

The key idea behind our technique is to leverage reachability results of vector 
addition systems (VAS) for invariant generation. Vector addition systems are a 
class of infinite-state transition systems with decidable reachability, classically 
used as a model of parallel systems [12]. We consider a variation of VAS, rational 
VAS with resets (Q-VASR), wherein there is a finite number of rational-typed 
variables and a finite set of transitions that simultaneously update each variable 
in the system by either adding a constant value or (re)setting the variable to 
a constant value. Our interest in Q-VASRs stems from the fact that there is 
(polytime) procedure to compute a linear arithmetic formula that represents a 
Q-VASR’s reachability relation [8]. 

Since the reachability relation of a Q-VASR is computable, the dynamics 
of Q-VASR can be analyzed without relying on heuristic techniques. However, 
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there is a gap between Q-VASR and the loops that we are interested in summa- 
rizing. The latter typically use a rich set of operations (memory manipulation, 
conditionals, non-constant increments, non-linear arithmetic, etc) and cannot be 
analyzed precisely. We bridge the gap with a procedure that, for any loop, syn- 
thesizes a Q-VASR that simulates it. The reachability relation of the Q-VASR 
can then be used to over-approximate the behavior of the loop. Moreover, we 
prove that if a loop is expressed in linear rational arithmetic (LRA), then our 
procedure synthesizes a best Q-VASR abstraction, in the sense that it simulates 
any other Q-VASR that simulates the loop. That is, imprecision in the analysis 
is due to inherent limitations of the Q-VASR model, rather heuristic algorithmic 
choices. 

One limitation of the model is that Q-VASRs over-approximate multi-path 
loops by treating the choice between paths as non-deterministic. We show that 
Q-VASRS, Q-VASR extended with control states, can be used to improve our 
invariant generation scheme by encoding control flow information and inter- 
path control dependencies that are lost in the Q-VASR abstraction. We give an 
algorithm for synthesizing a Q-VASRS abstraction of a given loop, which (like 
our Q-VASR abstraction algorithm) synthesizes best abstractions under certain 
assumptions. 

Finally, we note that our analysis techniques extend to complex control struc- 
tures (such as nested loops) by employing summarization compositionally (i.e., 
“bottom-up” ). For example, our analysis summarizes a nested loop by first sum- 
marizing its inner loops, and then uses the summaries to analyze the outer loop. 
As a result of compositionality, our analysis can be applied to partial programs, 
is easy to parallelize, and has the potential to scale to large code bases. 

The main contributions of the paper are as follows: 


— We present a procedure to synthesize Q-VASR abstractions of transition for- 
mulas. For transition formulas in linear rational arithmetic, the synthesized 
Q-VASR abstraction is a best abstraction. 

— We present a technique for improving the precision of our analysis by using 
Q-VASR with states to capture loop control structure. 

— We implement the proposed loop summarization techniques and show that 
their ability to verify user assertions is comparable to software model checkers, 
while at the same time providing theoretical guarantees of termination and 
invariant quality. 


1.1 Outline 


This section illustrates the high-level structure of our invariant generation 
scheme. The goal is to compute a transition formula that summarizes the behav- 
ior of a given program. A transition formula is a formula over a set of program 
variables Var along with primed copies Var’, representing the state of the program 
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procedure enqueue (elt) : procedure enqueue (): 
back := cons(elt, back) back_len := back_len + 1 
size := size + 1 mem_ops := mem_ops + 1 
size := size + 1 
procedure dequeue () : procedure dequeue () : 
if (front == nil) then if (front_len == 0) then 
// Reverse back, append to front while (back_len != 0) do 
while (back != nil) ‘do front_len := front_len + 1 
front := cons(head(back) ,front) back_len := back_len - 1 
back := tail(back) mem_ops := mem_ops + 3 
result := head(front) size := size - 1 
front := tail(front) front_len := front_len - 1 
size := size - 1 mem_ops := mem_ops + 2 
return result procedure harness(): 


nb_ops := 0 
while nondet() do 
nb_ops := nb_ops + 1 
if (size > 0 && nondet()) 
enqueue() 
else 
dequeue() 


(a) Persistent queue 


(b) Integer model & harness 


Fig. 1. A persistent queue and integer model. back_len and front_len models the 
lengths of the lists front and back; mem_ops counts the number of memory operations 
in the computation. 


before and after executing a computation (respectively). For any given program 
P, a transition formula TF] P] can be computed by recursion on syntax:+ 


TF[x := e] 2x’ =eA VAN y =y 
y#x€Var 


TF|if c then Pı else Pz] ê (c^ TFIPi]) V (~c^ TF[P:]) 
TF[P; ; Po] ê 3X € Z.TF[Pi] [Var => X] A TF[P2] [Var => X] 
TF[while c do P] 4 (c^ TFIP])* A (~c[Var = Var'}) 


where (—)* is a function that computes an over-approximation of the transitive 
closure of a transition formula. The contribution of this paper is a method for 
computing this (—)* operation, which is based on first over-approximating the 
input transition formula by a Q-VASR, and then computing the (exact) reach- 
ability relation of the Q-VASR. 


1 This style of analysis can be extended from a simple block-structured language to 
one with control flow and recursive procedures using the framework of algebraic 
program analysis [13,23]. 
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We illustrate the analysis on an integer model of a persistent queue data 
structure, pictured in Fig. 1. The example consists of two operations (enqueue 
and dequeue), as well as a test harness (harness) that non-deterministically 
executes enqueue and dequeue operations. The queue achieves O(1) amortized 
memory operations (mem_ops) in enqueue and queue by implementing the queue 
as two lists, front and back (whose lengths are modeled as front_len and 
back_len, respectively): the sequence of elements in the queue is the front list 
followed by the reverse of the back list. We will show that the queue functions 
use O(1) amortized memory operations by finding a summary for harness that 
implies a linear bound on mem_ops (the number of memory operations in the com- 
putation) in terms of nb_ops (the total number of enqueue/dequeue operations 
executed in some sequence of operations). 

We analyze the queue compositionally, in “bottom-up” fashion (i.e., start- 
ing from deeply-nested code and working our way back up to a summary for 
harness). There are two loops of interest, one in dequeue and one in harness. 
Since the dequeue loop is nested inside the harness loop, dequeue is analyzed 
first. We start by computing a transition formula that represents one execution 
of the body of the dequeue loop: 


front_len’ = front_len+ 1 
Aback_len’ = back_len — 1 
Amem_ops’ = mem_ops + 3 
Asize’ = size 


Bodyjeq = back_len > 0^ 


Observe that each variable in the loop is incremented by a constant value. As a 
result, the loop update can be captured faithfully by a vector addition system. 
In particular, we see that this loop body formula is simulated by the Q-VASR 
Vaeq (below), where the correspondence between the state-space of Bodygeq and 
Vaeq is given by the identity transformation (i.e., each dimension of Vaeg simply 
represents one of the variables of Bodygeq)- 


w 1000] |front-len w w+1 

x 0100 back_len x x—1 
= ; Vaeq = -=$ 

yY 0010 mem_ops y y +3 

z 0001 size z Z 


A formula representing the reachability relation of a vector addition system can 
be computed in polytime. For the case of Vaeq, a formula representing k steps of 
the Q-VASR is simply 


w=wtkAg=a-—kAy =yt 3kAz =z. (1) 


To capture information about the pre-condition of the loop, we can project the 
primed variables to obtain back_len > 0; similarly, for the post-condition, we can 
project the unprimed variables to obtain back_len’ > 0. Finally, combining (t) 
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(translated back into the vocabulary of the program) and the pre/post-condition, 
we form the following approximation of the dequeue loop’s behavior: 


front_len’ = front_len+k 


Aback_len’ = back_len — k back_len > 0 
ak > . 
ee Amem_ops’ = mem_ops + 3k A (: mU Coe 2 a 


A size’ = size 


WwW 


Using this summary for the dequeue loop, we proceed to compute a transition 
formula for the body of the harness loop (omitted for brevity). Just as with the 
dequeue loop, we analyze the harness loop by synthesizing a Q-VASR that sim- 
ulates it, Viar (below), where the correspondence between the state space of the 
harness loop and Viar is given by the transformation Shar: 


v 00010] |front_len size =v 
w 01000 back_len A^ back_len = w 
x| = |03100 mem_ops | ;i.e., | A mem_ops + 3back_len = x 
y 11000 size A back_len + front_len = y 
z 00001 nb_ops A nb_ops = z 
N. 
Shar 
v v+1 v v—1 v v—1 
w w+1 w w w 0 
Viar = x| > |xr+4|, |z| > |r+2|, |z| - |x4+2 
y y+tij jy y=1j jy wet 
z z+1 z z+1 z z+1 
ees ee CM 
enqueue dequeue fast dequeue slow 


Unlike the dequeue loop, we do not get an exact characterization of the 
dynamics of each changed variable. In particular, in the slow dequeue path 
through the loop, the value of front_len, back_len, and mem_ops change by 
a variable amount. Since back_len is set to 0, its behavior can be captured 
by a reset. The dynamics of front_len and mem_ops cannot be captured by 
a Q-VASR, but (using our dequeue summary) we can observe that the sum 
of front_len + back_len is decremented by 1, and the sum of mem_ops + 
3back_len is incremented by 2. 

We compute the following formula that captures the reachability relation 
of Viar (taking kı steps of enqueue, kz steps of dequeue fast, and kg steps of 
dequeue slow) under the inverse image of the state correspondence Shar: 


size’ = size + kı — kə — k3 
A((k3 = 0A back_len’ = back_len + k1) V (k3 > 0^0 < back_len’ < kı)) 
Amem_ops’ + 3back_len’ = mem_ops + 3back_len + 4kı + 2k2 + 2k3 
Afront_len’ + back_len’ = front_len + back_len + kı — ko — k3 
Anb_ops’ = nb_ops + kı + ko + k3 


From the above formula (along with pre/post-condition formulas), we obtain 
a summary for the harness loop (omitted for brevity). Using this summary 
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we can prove (supposing that we start in a state where all variables are zero) 
that mem_ops is at most 4 times nb_ops (i.e., enqueue and dequeue use O(1) 
amortized memory operations). 


2 Background 


The syntax of SLIRA, the existential fragment of linear integer/rational arith- 
metic, is given by the following grammar: 


s,t € Term:i=c|a|s+t|c-t 
F,G € Formula :=s < t|s=t|FAG|FVG |3x e€ Q.F |3re Z.F 


where x is a (rational sorted) variable symbol and c is a rational constant. 
Observe that (without loss of generality) formulas are free of negation. ILRA 
(linear rational arithmetic) refers to the fragment of ILIRA that omits quantifi- 
cation over the integer sort. 

A transition system is a pair (S,—) where S is a (potentially infinite) set 
of states and -C S$ x S is a transition relation. For a transition relation —, we 
use —* to denote its reflexive, transitive closure. 

A transition formula is a formula F(x, x’) whose free variables range over 
X = @,...,0%, and x’ = v},...,2/, (we refer to the number n as the dimension 
of F); these variables designate the state before and after a transition. In the 
following, we assume that transition formulas are defined over SLIRA. For a 
transition formula F'(x,x’) and vectors of terms s and t, we use F(s, t) to denote 
the formula F with each x; replaced by s; and each x; replaced by t;. A transition 
formula F(x, x’) defines a transition system (SF, —p), where the state space Sp 
is Q” and which can transition u > v iff F(u, v) is valid. 

For two rational vectors a and b of the same dimension d, we use a- b to 
denote the inner product a-b = yore aibi and axb to denote the pointwise (aka 
Hadamard) product (axb); = a;b;. For any natural number i, we use e; to denote 
the standard basis vector in the ith direction (i.e., the vector consisting of all 
zeros except the ith entry, which is 1), where the dimension of e; is understood 
from context. We use J, to denote the n x n identity matrix. 


Definition 1. A rational vector addition system with resets (Q- VASR) 
of dimension d is a finite set V C {0,1}4x Q? of transformers. Each transformer 
(r,a) € V consists of a binary reset vector r, and a rational addition vector a, 
both of dimension d. V defines a transition system (Sy,—y), where the state 


space Sy is Q? and which can transition u >y v iff v =r*u+a for some 
(r,a) EV. 


Definition 2. A rational vector addition system with resets and states 
(Q-VASRS) of dimension d is a pair V = (Q,E), where Q is a finite set of 
control states, and E C Q x {0,1}4 x Q? x Q is a finite set of edges labeled 
by (d-dimensional) transformers. V defines a transition system (Sy,—y), where 
the state space Sy is Q x Q” and which can transition (q1,u) >v (q2, v) iff there 
is some edge (qi, (r,a),q2) E€ E such that v =rx*xu-+a. 
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Our invariant generation scheme is based on the following result, which is a 
simple consequence of the work of Haase and Halfon: 


Theorem 1 ((8]). There is a polytime algorithm which, given a d-dimensional 
Q-VASRS V = (Q, EF), computes an ILIRA transition formula reach(V) such 
that for all u,v € Q?, we have (p, u) —}, (q, v) for some control states p,q E Q 
if and only if U reach(v) V- 


Note that Q-VASR can be realized as Q-VASRS with a single control state, 
so this theorem also applies to Q-VASR. 


3 Approximating Loops with Vector Addition Systems 


In this section, we describe a method for over-approximating the transitive 
closure of a transition formula using a Q-VASR. This procedure immediately 
extends to computing summaries for programs (including programs with nested 
loops) using the method outlined in Sect. 1.1. 

The core algorithmic problem that we answer in this section is: given a transi- 
tion formula, how can we synthesize a (best) abstraction of that formula’s dynam- 
ics as a Q-VASR? We begin by formalizing the problem: in particular, we define 
what it means for a Q-VASR to simulate a transition formula and what it means 
for an abstraction to be “best.” 


Definition 3. Let A = (Q",-4) and B = (Q™,—B) be transition systems 
operating over rational vector spaces. A linear simulation from A to B is a 
linear transformation S : Q™*” such that for all u,v € Q” for which u —>4 v, 
we have Su >p Sv. We use A |Fs B to denote that S is a linear simulation 
from A to B. 


Suppose that F'(x,x’) is an n-dimensional transition formula, V is a d- 
dimensional Q-VASR, and S$ : Q?¢*” is linear transformation. The key property 
of simulations that underlies our loop summarization scheme is that if F IFs V, 
then reach(V)(Sx, Sx’) (i.e., the reachability relation of V under the inverse 
image of S) over-approximates the transitive closure of F. Finally, we observe 
that simulation F lg V can equivalently be defined by the validity of the entail- 
ment F H (S, V), where 


(S, V) = V Sx'=r*Sx+a 
(r,a)EV 


is a transition formula that represents the transitions that V simulates under 
transformation S. 

Our task is to synthesize a linear transformation S and a Q-VASR V such 
that F lFs V. We call a pair (S,V), consisting of a rational matrix S € Q¢*” 
and a d-dimensional Q-VASR V, a Q-VASR abstraction. We say that n is the 
concrete dimension of (S,V) and d is the abstract dimension. If F I-s V, then 
we say that (S, V) is a Q-VASR abstraction of F. A transition formula may 
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have many Q-VASR abstractions; we are interested in computing a Q-VASR 
abstraction (S,V) that results in the most precise over-approximation of the 
transitive closure of F. Towards this end, we define a preorder < on Q-VASR 
abstractions, where ($1,V+) < (92, V?) iff there exists a linear transformation 
T € Q°*4 such that Vt Ik; V? and TS! = 9? (where d and e are the abstract 
dimensions of ($1,V+) and (S?,V7), respectively). Observe that if (S!,V!) x 
(S?,V7), then reach(V'!)(S'x, Stx’) = reach(V2)(S2x, S?x’). 

Thus, our problem can be stated as follows: given a transition formula F, 
synthesize a Q-VASR abstraction (5,V) of F such that (S,V) is best in the 
sense that we have (S, V) < (S,V) for any Q-VASR abstraction (9,V) of F. A 
solution to this problem is given in Algorithm 1. 


Algorithm 1. abstract-VASR(F) 


input : Transition formula F of dimension n 
output: Q-VASR abstraction of F; Best Q-VASR abstraction if F in ILRA 
Skolemize existentials of F’; 
(S,V) — Un, 9); // (In, @) is least in < order 
re F; 
while I is satisfiable do 

Let M be a model of I; 

C <— cube of the DNF of F with M } C; 

(S, V) — (S, V) U â(C); 

rera (S, V) 


return (S, V) 


o ANOaAak WSN 


Algorithm 1 follows the familiar pattern of an AllSat-style loop. The algo- 
rithm takes as input a transition formula F. It maintains a Q-VASR abstraction 
(S,V) and a formula I’, whose models correspond to the transitions of F that 
are not simulated by (S,V). The idea is to build (S,V) iteratively by sampling 
transitions from I’, augmenting (S,V) to simulate the sample transition, and 
then updating I’ accordingly. We initialize (S,V) to be (In, Ø), the canonical 
least Q-VASR, abstraction in < order, and I to be F (i.e., (In, Ø) does not sim- 
ulate any transitions of F). Each loop iteration proceeds as follows. First, we 
sample a model M of I (i.e., a transition that is allowed by F but not simulated 
by (S,V)). We then generalize that transition to a set of transitions by using M 
to select a cube C of the DNF of F that contains M. Next, we use the procedure 
described in Sect. 3.1 to compute a Q-VASR abstraction @(C) that simulates the 
transitions of C. We then update the Q-VASR abstraction (S, V) to be the least 
upper bound of (S, V) and 4(C) (w.r.t. < order) using the procedure described 
in Sect. 3.2 (line 7). Finally, we block any transition simulated by the least upper 
bound (including every transition in C) from being sampled again by conjoining 
a 7(5,V) to I. The loop terminates when I’ is unsatisfiable, in which case we 
have that F IFs V. Theorem 2 gives the correctness statement for this algorithm. 
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Theorem 2. Given a transition formula F, Algorithm 1 computes a simulation 
S and Q-VASR V such that F IFs V. Moreover, if F is in ILRA, Algorithm 1 
computes a best Q-VASR abstraction of F. 


The proof of this theorem as well as the proofs to all subsequent theorems, 
lemmas, and propositions are in the extended version of this paper [20]. 


3.1 Abstracting Conjunctive Transition Formulas 


This section shows how to compute a Q-VASR abstraction for a consistent conjunc- 
tive formula. When the input formula is in ILRA, the computed Q-VASR abstrac- 
tion will be a best Q-VASR abstraction of the input formula. The intuition is that, 
since JLRA is a convex theory, a best Q-VASR abstraction consists of a single tran- 
sition. For JLIRA formulas, our procedure produces a Q-VASR abstract that is not 
guaranteed to be best, precisely because JLIRA is not convex. 

Let C be consistent, conjunctive transition formula. Observe that the set 
Resc £ {(s,a) : C | s- x’ =a}, which represents linear combinations of vari- 
ables that are reset across C, forms a vector space. Similarly, the set Inco = 
{(s,a): C H s-x’ =s-x+ab}, which represents linear combinations of vari- 
ables that are incremented across C’, forms a vector space. We compute bases for 
both Resc and Inco, say {(S1,41),.--;(Sm,@m)} and {(Sm41,@m+1), ---, (Sa; Ga) }, 
respectively. We define @(C) to be the Q-VASR abstraction 4(C) = (S,{(r,a)}), 
where 


Sı (d—m) times ay 
aN 
SAI: r4[0---0 1---1] aê 
: “—— 
Sd m times aad 


Example 1. Let C be the formula z! = t +y ^y = 2yAw' =wAw = 
w+1A 2’ = w. The vector space of resets has basis {([0 0 —1 1] ,0)} (rep- 
resenting that z — w is reset to 0). The vector space of increments has basis 
{({1 —1 0 0] ,0), ([0 0 1 0] ,0), ([0 0 —1 1] ,1)} (representing that the difference 
x — y does not change, the difference z — w increases by 1, and the variable w 
does not change). A best abstraction of C is thus the four-dimensional Q-VASR. 


0 0 00-11 
1 0 1—1 00 
aan 1° J0 s= 1 0 
1 1 00-11 


In particular, notice that since the term z — w is both incremented and reset, it 
is represented by two different dimensions in 4(C). 


Proposition 1. For any consistent, conjunctive transition formula C, &(C) is 
a Q-VASR abstraction of C. If C is expressed in ILRA, then G(C) is best. 
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3.2 Computing Least Upper Bounds 


This section shows how to compute least upper bounds w.r.t. the < order. 

By definition of the = order, if (S,V) is an upper bound of ($1,V+) and 
(S?, V7), then there must exist matrices T! and T? such that TS! = S = T?S?, 
V! lF V, and V? lFr2 V. As we shall see, if (S,V) is a least upper bound, 
then it is completely determined by the matrices T! and T?. Thus, we shift our 
attention to computing simulation matrices T! and T? that induce a least upper 
bound. 

In view of the desired equation T'S' = S$ = T?§?, let us consider the 
constraint T1S1 = T? S? on two unknown matrices T! and T?. Clearly, we have 
T'S' = T? S? iff each (T}, T2) belongs to the set T = {(t!,t?) : tS! = t292}. 
Observe that TJ is a vector space, so there is a best solution to the constraint 
T'S! = T?S?: choose T and T? so that the set of all row pairs (T},7?) forms 
a basis for T. In the following, we use pushout(S', S?) to denote a function that 
computes such a best (T1,T?). 

While pushout gives a best solution to the equation TS! = T?S?, it is 
not sufficient for the purpose of computing least upper bounds for Q-VASR 
abstractions, because Tt and T? may not respect the structure of the Q-VASR 
Vt and V? (i.e., there may be no Q-VASR V such that V+ lk V and V? IF z2 V). 
Thus, we must further constrain our problem by requiring that T! and T? are 
coherent with respect to V' and V? (respectively). 


Definition 4. Let V be a d-dimensional Q-VASR. We say that i,j € {1,...,d} 
are coherent dimensions of V if for all transitions (r,a) € V we have r; = rj 
(i.e., every transition of V that resets i also resets j and vice versa). We denote 
that i and j are coherent dimensions of V by writing i =y j, and observe that =y 
forms an equivalence relation on {1,...,d}. We refer to the equivalence classes 
of =v as the coherence classes of V. 

A matriz T € Q°*4 is coherent with respect to V if and only if each of 
its rows have non-zero values only in the dimensions corresponding to a single 
coherence class of V. 


For any d-dimensional Q-VASR V and coherence class C = {c1,...,cx} of 
V, define Ic to be the k x d dimensional matrix whose rows are €¢,,...,@c,- 
Intuitively, Ic is a projection onto the set of dimensions in C. 

Coherence is a necessary and sufficient condition for linear simulations 
between Q-VASR in a sense described in Lemmas 1 and 2. 


Lemma 1. Let V! and V? be Q-VASR (of dimension d and e, respectively), 
and let T € Q°*? be a matrix such that V! l-r V2. Then T must be coherent 
with respect to V+. 


Let V be a d-dimensional Q-VASR and let T € Q°*? be a matrix that is 
coherent with respect to V and has no zero rows. Then there is a (unique) e- 
dimensional Q-VASR image(V,T) such that its transition relation — jmage(v,r) 
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Algorithm 2. (S1, V+) U (S?, V?) 
input : Normal Q-VASR abstractions (5*1, V*) and (9°, V?) of equal concrete 
dimension 
output: Least upper bound (w.r.t. <) of (S1, V?) and (S+, V°) 
S,T!,T? — empty matrices; 
foreach coherence class Ct of V+ do 


foreach coherence class C? of V? do 
(Ut, U?) — pushout( Ho S+, Hoz 9°); 


S _ ml T | n2 ce 
a Pere E PA = |v zee 


V — image(V',T') U image(V?, T’); 
return (S, V) 


a FPF ONB 


ND 


is equal to {(Tu, Tv) : uy v} (the image of V’s transition relation under T). 
This Q-VASR can be defined by: 


image(V,T) £ {(T X r, Ta) : (r,a) € V} 


where T X r is the reset vector r translated along T (i.e., (T Xr); = r; where 
j is an arbitrary choice among dimensions for which Tj; is non-zero—at least 
one such j exists because the row T; is non-zero by assumption, and the choice 
of j is arbitrary because all such j belong to the same coherence class by the 
assumption that T is coherent with respect to V). 


Lemma 2. Let V be a d-dimensional Q-VASR and let T € Q°**% be a matrix 
that is coherent with respect to V and has no zero rows. Then the transition 
relation of image(V,T) is the image of V’s transition relation under T (i.e., 
image(v,T) İs equal to {(Tu, Tv) : u >v v}). 


Finally, prior to describing our least upper bound algorithm, we must define 
a technical condition that is both assumed and preserved by the procedure: 


Definition 5. A Q-VASR abstraction (S,V) is normal if there is no non-zero 
vector z that is coherent with respect to V such that zS = 0 (i.e., the rows of S 
that correspond to any coherence class of V are linearly independent). 


Intuitively, a Q-VASR abstraction that is not normal contains information that 
is either inconsistent or redundant. 

We now present a strategy for computing least upper bounds of Q-VASR 
abstractions. Fix (normal) Q-VASR abstractions ($1, V+) and (S?, V2). Lemmas 1 
and 2 together show that a pair of matrices T! and T? induce an upper bound (not 
necessarily least) on (S1, V+) and ($7, V?) exactly when the following conditions 
hold: (1) T'S? = T252, (2) T? is coherent w.r.t. V1, (3) T? is coherent w.r.t. V2, 
and (4) neither T! nor T? contain zero rows. The upper bound induced by T! and 
T? is given by 


ub(T!,T?) & (T'S", image(V", T!) U image(V?, T?)). 
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We now consider how to compute a best such T! and fi Observe that conditions 
(1), (2), and (3) hold exactly when for each row i, (T}, T?) belongs to the set 


T £ {(t',t?) : t91 = t? 9? At! coherent w.r.t. V' At' coherent w.r.t. V°}. 


Since a row vector tê is coherent w.r.t. V‘ iff its non-zero positions belong to the 
same coherence class of V* (equivalently, tê = ulco: for some coherence class 
C? and vector u), we have T = (Jo: c2 T (Ct, C°), where the union is over all 
coherence classes C! of V! and C? of V?, and 


TiC) £ {(u' Ho, u? I ¢2) H u! Io S! = uw ¢2 $*}. 


Observe that each T(C!,C?) is a vector space, so we can compute a pair of 
matrices T! and T? such that the rows (T}, T?) collectively form a basis for each 
T(C',C?). Since (St, V+) and (S?, V?) are normal (by assumption), neither T! 
nor T? may contain zero rows (condition (4) is satisfied). Finally, we have that 
ub(T!, T?) is the least upper bound of ($',V') and (S?,V7). Algorithm 2 is a 
straightforward realization of this strategy. 


Proposition 2. Let ($1,V+) and (S?, V?) be normal Q-VASR abstractions of 
equal concrete dimension. Then the Q-VASR abstraction (S,V) computed by 
Algorithm 2 is normal and is a least upper bound of (S1, V?) and (S?,V2). 


4 Control Flow and Q-VASRS 


In this section, we give a method for improving the precision of our loop summa- 
rization technique by using Q-VASRS; that is, Q-VASR extended with control 
states. While Q-VASRs over-approximate control flow using non-determinism, 
Q-VASRSs allow us to analyze phenomena such as oscillating and multi-phase 
loops. 

We begin with an example that demonstrates the precision gained by Q- 
VASRS. The loop in Fig. 2a oscillates between (1) incrementing variable i by 1 
and (2) incrementing both variables i and x by 1. Suppose that we wish to prove 


int x = 0; i= 1 


} ie 
while (*) do 
if if2 == 0 then 
i ts i +i i e 
else a” bo 
i := i + 1 : 
or ies B bee 


xs t 
HCJ B bee 


(a) Oscillating loop (b) Q-VASR abstraction. (c) Q-VASRS abstraction. 


Fig. 2. An oscillating loop and its representation as a Q-VASR and Q-VASRS. 
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that, starting with the configuration x = 0Az = 1, the loop maintains the invari- 
ant that 2x < i. The (best) Q-VASR abstraction of the loop, pictured in Fig. 2b, 
over-approximates the control flow of the loop by treating the conditional branch 
in the loop as a non-deterministic branch. This over-approximation may violate 
the invariant 2x < i by repeatedly executing the path where both variables are 
incremented. On the other hand, the Q-VASRS abstraction of the loop pictured 
in Fig. 2c captures the understanding that the loop must oscillate between the 
two paths. The loop summary obtained from the reachability relation of this 
Q-VASRS is powerful enough to prove the invariant 2x < i holds (under the 
precondition z =0A%= 1). 


4.1 Technical Details 


In the following, we give a method for over-approximating the transitive closure 
of a transition formula F'(x,x’) using a Q-VASRS. We start by defining predi- 
cate Q-VASRS, a variation of Q-VASRS with control states that correspond to 
disjoint state predicates (where the states intuitively belong to the transition 
formula F rather than the Q-VASRS itself). We extend linear simulations and 
best abstractions to predicate Q-VASRS, and give an algorithm for synthesizing 
best predicate Q-VASRS abstractions (for a given set of predicates). Finally, we 
give an end-to-end algorithm for over-approximating the transitive closure of a 
transition formula. 


Definition 6. A predicate Q-VASRS over x is aQ-VASRS V = (P, E), such 
that each control state is a predicate over the variables x and the predicates in 
P are pairwise inconsistent (for all p#q €P, p^q is unsatisfiable). 


We extend linear simulations to predicate Q-VASRS as follows: 


— Let F'(x,x’) be an n-dimensional transition formula and let V = (P, E) be 
an m-dimensional Q-VASRS over x. We say that a linear transformation 
S:Q™*” is a linear simulation from F to V if for all u,v € Q” such that 
u —>p v, (1) there is a (unique) p € P such that p(u) is valid (2) there is a 
(unique) q € P such that q(v) is valid, and (3) (p, Su) >v (q, Sv). 

— Let Vt = (Pt, E!) and V? = (P?, E?) be predicate Q-VASRSs over x (for 
some x) of dimensions d and e, respectively. We say that a linear transforma- 
tion S : Q°*? is a linear simulation from V+ to V? if for all p',q! € Pt and for 
all u,v € Qf such that (pt, u) 1 (q',v), there exists (unique) p?,q? € P? 
such that (1) (p?, Su) >y2 (q?, Sv), (2) pt = p?, and (3) q! = @. 


We define a Q-VASRS abstraction over x = 2}1,...,U%p to be a pair (S,V) 
consisting of a rational matrix 5 € Q¢*” and a predicate Q-VASRS of dimension 
d over x. We extend the simulation preorder < to Q-VASRS abstractions in the 
natural way. Extending the definition of “best” abstractions requires more care, 
since we can always find a “better” Q-VASRS abstraction (strictly smaller in < 
order) by using a finer set of predicates. However, if we consider only predicate 
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Algorithm 3. abstract-VASRS(F, P) 


input : Transition formula F(x, x’), set of pairwise-disjoint predicates P over 
x such that for all u,v with u —>r v, there exists p,q € P with p(u) 
and q(v) both valid 

output: Best Q-VASRS abstraction of F with control states P 

For all p,q € P, let (Sp,q, Vp,q) — abstract-VASR(p(x) A F(x, x’) A q(x’)); 

(S,V) — least upper bound of all (Sp,¢, Vp,q); 

For all p,q € P, let T,,q — the simulation matrix from (Sp,q, Vp,q) to (S, V); 

E = {(p,r,a,q) : p,q € P, (r,a) € image(Vp,q, Tp,a)}; 

return (S, (P, £)) 


a bp WN 


Q-VASRS that share the same set of control states, then best abstractions do 
exist and can be computed using Algorithm 3. 

Algorithm 3 works as follows: first, for each pair of formulas p,q € P, compute 
a best Q-VASR abstraction of the formula p(x) A F'(x,x’) A q(x’) and call it 
(Spa; Vp). (Sp,q; Vp,q) OVer-approximates the transitions of F that begin in a 
program state satisfying p and end in a program state satisfying q. Second, we 
compute the least upper bound of all Q-VASR abstractions (Sp.4,Vp,q) to get 
a Q-VASR abstraction ($,V) for F. As a side-effect of the least upper bound 
computation, we obtain a linear simulation T,,, from (Sp.4,V p,q) to (S, V) for 
each p,q. A best Q-VASRS abstraction of F(x,x’) with control states P has S 
as its simulation matrix and has the image of Vp, under Tp,q as the edges from 
p to q. 


Proposition 3. Given an transition formula F(x, x’) and control states P over 
x, Algorithm 8 computes the best predicate Q-VASRS abstraction of F with con- 
trol states P. 


We now describe iter-VASRS (Algorithm 4), which uses Q-VASRS to over- 
approximate the transitive closure of transition formulas. Towards our goal 
of predictable program analysis, we desire the analysis to be monotone in 
the sense that if F and G are transition formulas such that F entails G, 
then iter-VASRS(F’) entails iter-VASRS(G). A sufficient condition to guarantee 
monotonicity of the overall analysis is to require that the set of control states 
that we compute for F is at least as fine as the set of control states we com- 
pute for G. We can achieve this by making the set of control states P of input 
transition formula F(x, x’) equal to the set of connected regions of the topolog- 
ical closure of 3x’.F (lines 1-4). Note that this set of predicates may fail the 
contract of abstract-VASRS: there may exist a transition u —>p v such that 
v JÆ VP (this occurs when there is a state of F with no outgoing transitions). 
As a result, (S,V) = abstract-VASRS(F, P) does not necessarily approximate 
F; however, it does over-approximate F ^A V P(x’). An over-approximation of 
the transitive closure of F can easily be obtained from reach(V)(Sx, Sx’) (the 
over-approximation of the transitive closure of F A V P(x’) obtained from the 
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Q-VASRS abstraction (S, V)) by sequentially composing with the disjunction of 
F and the identity relation (line 6). 


Algorithm 4. iter-VASRS(F’) 


input : Transition formula F(x, x’) 
output: Over-approximation of the transitive closure of F 
P + topological closure of DNF of 4x’.F (see [17]); 
/* Compute connected regions */ 
while 4p1,p2 € P with pi A p2 satisfiable do 
| P = (P \ {p1; p2}) U {p1 V p2} 
(S, V) — abstract-VASRS(F, P); 
6 return reach(V)(Sx, Sx’) o (x' = x V F) 


A Ne 


on 


Precision Improvement. The abstract-VASRS algorithm uses predicates to infer 
the control structure of a Q-VASRS, but after computing the Q-VASRS abstrac- 
tion, iter-VASRS makes no further use of the predicates (i.e., the predicates are 
irrelevant in the computation of reach(V)). Predicates can be used to improve 
iter-VASRS as follows: the reachability relation of a Q-VASRS is expressed by 
a formula that uses auxiliary variables to represent the state at which the com- 
putation begins and ends [8]. These variables can be used to encode that the 
pre-state of the transitive closure must satisfy the predicate corresponding to the 
begin state and the post-state must satisfy the predicate corresponding to the 
end state. As an example, consider the Fig. 2 and suppose that we wish to prove 
the invariant x < 2i under the pre-condition i = 0 A z = 0. While this invariant 
holds, we cannot prove it because there is counter example if the computation 
begins at i%2 == 1. By applying the above improvement, we can prove that the 
computation must begin at {%2 == 0, and the invariant is verified. 


5 Evaluation 


The goals of our evaluation is the answer the following questions: 


— Are Q-VASR sufficiently expressive to be able to generate accurate loop sum- 
maries? 

— Does the Q-VASRS technique improve upon the precision of Q-VASR? 

— Are the Q-VASR/Q-VASRS loop summarization algorithms performant? 


We implemented our loop summarization procedure and the compositional 
whole-program summarization technique described in Sect. 1.1. We ran on a suite 
of 165 benchmarks, drawn from the C4B [2] and HOLA [4] suites, as well as the 
safe, integer-only benchmarks in the loops category of SV-Comp 2019 [22]. We 
ran each benchmark with a time-out of 5min, and recorded how many bench- 
marks were proved safe by our Q-VASR-based technique and our Q-VASRS- 
based technique. For context, we also compare with CRA [14] (a related loop 
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summarization technique), as well as SeaHorn [7] and UltimateAutomizer [9] 
(state-of-the-art software model checkers). The results are shown in Fig. 3. 

The number of assertions proved correct using Q-VASR is comparable to 
both SeaHorn and UltimateAutomizer, demonstrating that Q-VASR can indeed 
model interesting loop phenomena. Q-VASRS-based summarization significantly 
improves precision, proving the correctness of 93% of assertions in the svcomp 
suite, and more than any other tool in total. Note that the most precise tool for 
each suite is not strictly better than each of the other tools; in particular, there 
is only a single program in the HOLA suite that neither Q-VASRS nor CRA can 
prove safe. 

CRA-based summarization is the most performant of all the compared tech- 
niques, followed by Q-VASR and Q-VASRS. SeaHorn and UltimateAutomizer 
employ abstraction-refinement loops, and so take significantly longer to run the 
test suite. 


Q-VASRS 


#safe time 


CRA 


#safe time 


UltAuto 
#safe time 


SeaHorn 
#safe time 


| Q-VASR 


#safe time 


C4B 35| 21 37.9) 31 35.4) 27 33.1] 23 2434.4) 25 3881.6 
HOLA 46| 32 57.2) 39 73.0] 40 56.0] 35 2115.0) 36 2995.9 


svcomp19-int 84|| 68 86.9} 78 184.5) 76 91.9] 62 3038.0) 64 6923.5 


Fig. 3. Experimental results. 


6 Related Work 


Compositional Analysis. Our analysis follows the same high-level structure as 
compositional recurrence analysis (CRA) [5,14]. Our analysis differs from CRA 
in the way that it summarizes loops: we compute loop summaries by over- 
approximating loops with vector addition systems and computing reachability 
relations, whereas CRA computes loop summaries by extracting recurrence rela- 
tions and computing closed forms. The advantage of our approach is that is that 
we can use Q-VASR to accurately model multi-path loops and can make theo- 
retical guarantees about the precision of our analysis; the advantage of CRA is 
its ability to generate non-linear invariants. 


Vector Addition Systems. Our invariant generation method draws upon Haase 
and Halfon’s polytime procedure for computing the reachability relation of inte- 
ger vector addition systems with states and resets [8]. Generalization from the 
integer case to the rational case is straightforward. Continuous Petri nets [3] are 
a related generalization of vector addition systems, where time is taken to be 
continuous (Q-VASR, in contrast, have rational state spaces but discrete time). 
Reachability for continuous Petri nets is computable polytime [6] and definable 
in SLRA [1]. 
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Sinn et al. present a technique for resource bound analysis that is based on 
modeling programs by lossy vector addition system with states [21]. Sinn et al. 
model programs using vector addition systems with states over the natural num- 
bers, which enables them to use termination bounds for VASS to compute upper 
bounds on resource usage. In contrast, we use VASS with resets over the rationals, 
which (in contrast to VASS over N) have a SLIRA-definable reachability relation, 
enabling us to summarize loops. Moreover, Sinn et al.’s method for extracting 
VASS models of programs is heuristic, whereas our method gives precision guar- 
antees. 


Affine and Polynomial Programs. The problem of polynomial invariant genera- 
tion has been investigated for various program models that generalize Q-VASR, 
including solvable polynomial loops [19], (extended) P-solvable loops [11,15], 
and affine programs [10]. Like ours, these techniques are predictable in the sense 
that they can make theoretical guarantees about invariant quality. The kinds 
invariants that can be produced using these techniques (conjunctions of polyno- 
mial equations) is incomparable with those generated by the method presented 
in this paper (ILIRA formulas). 


Symbolic Abstraction. The main contribution of this paper is a technique for 
synthesizing the best abstraction of a transition formula expressible in the lan- 
guage of Q-VASR (with or without states). This is closely related to the sym- 
bolic abstraction problem, which computes the best abstraction of a formula 
within an abstract domain. The problem of computing best abstractions has been 
undertaken for finite-height abstract domains [18], template constraint matrices 
(including intervals and octagons) [16], and polyhedra [5,24]. Our best abstrac- 
tion result differs in that (1) it is for a disjunctive domain and (2) the notion of 
“best” is based on simulation rather than the typical order-theoretic framework. 
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Abstract. Automated reasoning procedures are essential for a number 
of applications that involve bit-exact floating-point computations. This 
paper presents conditions that characterize when a variable in a floating- 
point constraint has a solution, which we call invertibility conditions. We 
describe a novel workflow that combines human interaction and a syntax- 
guided synthesis (SyGuS) solver that was used for discovering these con- 
ditions. We verify our conditions for several floating-point formats. One 
implication of this result is that a fragment of floating-point arithmetic 
admits compact quantifier elimination. We implement our invertibility 
conditions in a prototype extension of our solver CVC4, showing their 
usefulness for solving quantified constraints over floating-points. 


1 Introduction 


Satisfiability Modulo Theories (SMT) formulas including either the theory of 
floating-point numbers [12] or universal quantifiers [24,32] are widely regarded 
as some of the hardest to solve. Problems that combine universal quantification 
over floating-points are rare—experience to date has suggested they are hard for 
solvers and would-be users should either give up or develop their own incomplete 
techniques. However, progress in theory solvers for floating-point [11] and the 
use of expression synthesis for handling universal quantifiers [27,29] suggest that 
these problems may not be entirely out of reach after all, which could potentially 
impact a number of interesting applications. 

This paper makes substantial progress towards a scalable approach for solv- 
ing quantified floating-point constraints directly in an SMT solver. Developing 
procedures for quantified floating-points requires considerable effort, both foun- 
dationally and in practice. We focus primarily on establishing a foundation for 
lifting to quantified floating-point formulas a procedure for solving quantified 
bit-vector formulas by Niemetz et al. [26]. That procedure relies on so-called 
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invertibility conditions, intuitively, formulas that state under which conditions 
an argument of a given operator and predicate in an equation has a solution. 
Building on this concept and a state-of-the-art expression synthesis engine [29], 
we generate invertibility conditions for a majority of operators and predicates in 
the theory of floating-point numbers. In the context of quantifier-free floating- 
point formulas, floating-point invertibility conditions may enable us to lift the 
propagation-based local search approach for bit-vectors in [25] to the theory of 
floating-point numbers. 

This work demonstrates that invertibility conditions exist and show promise 
for solving quantified floating-point constraints. More specifically, it makes the 
following contributions: 


— In Sect.3, we present invertibility conditions for the majority of operators 
and predicates in the SMT-LIB standard theory of floating-point numbers. 

— In Sect. 4, we present a custom methodology based on syntax-guided synthesis 
and decision tree learning that we developed for the purpose of synthesizing 
the invertibility conditions presented here. 

— In Sect. 5, we present a quantifier elimination procedure for a fragment of 
the theory that is based on invertibility conditions, and give experimental 
evidence of its potential, based on quantified floating-point problems coming 
from a verification application. 


Related Work. To our knowledge, no previous work specifically discusses tech- 
niques for solving universally quantified floating-point formulas. Brain et al. [11] 
provide a comprehensive review of decision procedures for quantifier-free bit- 
exact floating-point using both SMT-based as well as other approaches. They 
identify four groups of techniques: bit-blasting approaches that use floating-point 
circuits to generate bit-vector formulas [13,16, 20,33], interval techniques that 
use partitioning and interval propagation [10, 22, 23,31], optimization and numer- 
ical approaches that work with complete valuations [4,7,18,21], and axiomatic 
techniques that use partial or total axiomatizations of the theory of floating-point 
numbers in other theories such as real arithmetic [14, 15]. 

On the other hand, approaches for universal quantification have been devel- 
oped in modern SMT solvers that target other background theories, includ- 
ing linear arithmetic [8,17,29] and bit-vectors [26,27,32]. At a high level, these 
approaches use model-based refinement loops that lazily add instances of univer- 
sal quantifiers until they reach a conflict at the quantifier-free level, or otherwise 
saturate with a model. 


2 Preliminaries 


We assume the usual notions and terminology of many-sorted first-order logic with 
equality (denoted by ~). Let X bea signature consisting of a set X° of sort symbols 
and aset Xf of interpreted (and sorted) function symbols. Each function symbol f 
has a sort T1 X ... X Tn — T, with arity n > 0 and T1, ..., Tn; T E X°. We assume that 
X includes a Boolean sort Bool and the Boolean constants T (true) and L (false). 
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We further assume the usual definition of well-sorted terms, literals, and (quanti- 
fied) formulas with variables and symbols from X, and refer to them as X-terms, 
X-atoms, and so on. For a X-term or X-formula e, we denote the free variables 
of e (defined as usual) as F'V(e) and use e[x] to denote that the variable x occurs 
free in e. We write e[t] for the term or formula obtained from e by replacing each 
occurrence of x in e by t. 

A theory T is a pair (X, I), where X is a signature and J is a non-empty class 
of X/-interpretations (the models of T) that is closed under variable reassignment, 
i.e., every X-interpretation that only differs from an Z € J in how it interprets 
variables is also in J. A X-formula ọ is T-satisfiable (resp. T-unsatisfiable) if it 
is satisfied by some (resp. no) interpretation in J; it is T-valid if it is satisfied by 
all interpretations in J. We will sometimes omit T when the theory is understood 
from context. 

We briefly recap the terminology and notation of Brain et al. [12] which 
defines an SMT-LIB theory Typ of floating-point numbers based on the IEEE- 
754 2008 standard [3]. The signature of Trp includes a parametric family of 
sorts Fo where € and ø are integers greater than or equal to 2 giving the 
number of bits used to store the exponent e and significand s, respectively. 
Each of these sorts contains five kinds of constants: normal numbers of the form 
1.s * 2°, subnormal numbers of the form 0.s * 2-2” '~1, two zeros (+0 and —0), 
two infinities (+00 and —oo) and a single not-a-number (NaN). We assume a 
map Ve, for each sort, which maps these constants to their value in the set 
R* = RU {+00, —co, NaN}. The theory also provides a rounding-mode sort RM, 
which contains five elements {RNE, RNA, RTP, RTN, RTZ}. 

Table 1 lists all considered operators and predicate symbols of theory Trp. 
The theory contains a full set of arithmetic operations {]|. ..|, +, —, = +, ,/, max, 
min} as well as rem (remainder), rti (round to integral) and fma (combined mul- 
tiply and add with just one rounding). The precise semantics of these operators 
is given in [12] and follows the same general pattern: ve,, is used to project the 
arguments to R*, the normal arithmetic is performed in R*, then the rounding 
mode and the result are used to select one of the adjoints of ve, to convert 
the result back to F o. Note that the full theory in [12] includes several addi- 
tional operators which we omit from discussion here, such as floating-point min- 
imum/maximum, equality with floating-point semantics (fp.eq), and conversions 
between sorts. 

Theory Trp further defines a set of ordering predicates {<,>,<,>} anda 
set of classification predicates {isNorm, isSub, islnf, isZero, isNaN, isNeg, isPos}. In 
the following, we denote the rounding mode of an operation above the operator 


symbol, e.g., at b adds a and b and rounds the result towards zero. We use the 
infix operator style for isInf (... ~ +00), isZero (... ~ +0), and isNaN (... = 
NaN) for conciseness. We further use min,/max, and ming/max, for floating- 
point constants representing the minimum/maximum normal and subnormal 
numbers, respectively. We will omit rounding mode and floating-point sorts if 
they are clear from the context. 
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3 Invertibility Conditions for Floating-Point Formulas 


In this section, we adapt the concept of invertibility conditions introduced by 
Niemetz et al. in [26] to our theory Tp. Intuitively, an invertibility condition ¢, 
for a literal [x] is the exact condition under which /[z] has a solution for 2, i.e., 
ge is equivalent to dx. lx] in Trp. 


Definition 1 (Floating-Point Invertibility Condition). Let lx] be a Xpp-literal. 
A quantifier-free Xpp-formula e is an invertibility condition for x in Ia] if 
x € FV(¢-) and ¢ & Jz. lx] is Trp -valid. 


As a simple example of an invertibility condition, given literal |x| ~ t where 
|x| denotes the absolute value of x, a solution for x exists if and only if t is 
not negative, i.e., if sisNeg(t) holds. We introduce additional terminology for 
the sake of the discussion. We define the dimension of an invertibility condition 
problem Jz. 1[z] as the number of free variables it contains. For example, if s 
and t are variables, then the dimension of Jx. x + s ~ t is two, the dimension of 
Jz. isZero(a + s) is one, and the dimension of Jz. isZero(|x|) is zero. A literal J[a] 
is fully invertible if its invertibility condition is T. A term e is an (unconditional) 
inverse for x in I[z] if Ie] is equivalent to T. For example, the literal —x ~ t 
is fully invertible and —t is an inverse for x in this literal. We say that e is a 
conditional inverse for l|x] if le] is an invertibility condition for I[2]. 

Our primary goal in this work is to establish invertibility conditions for all 
floating-point constraints that contain exactly one operator and one predicate. 
These conditions collectively suffice to characterize when any literal [x] con- 
taining exactly one occurrence of x, the variable to solve for, has a solution. In 
total, we were able to establish 167 out of 188 invertibility conditions (count- 
ing commutative cases only once) using a syntax-guided synthesis framework 
which we describe in more detail in Sect. 4. In this section, we present a subset 
of these invertibility conditions, highlighting the most interesting cases where 


Table 1. Considered floating-point predicates/operators, with SMT-LIB 2 syntax. 


Symbol SMT-LIB syntax Sort 

isNorm, isSub fp.isNormal, fp.isSubnormal Fz,¢ — Bool 

isPos, isNeg fp.isPositive, fp.isNegative Fe,¢ — Bool 

isInf, isNaN, isZero fp.isInfinite, fp.isNaN, fp.isZero Fz,, — Bool 

x, <, >, <, > =, fp.lt, fp.gt, fp.leq, fp.geq Fz,o X Fe,o — Bool 

|...|, — fp.abs, fp.neg Feo > Fe,o 

rem fp.rem Feo X Feo > Feo 

VA rti fp.sqrt, fp.roundTolIntegral RM x Fe o > Fe,o 

age ety er fp.add, fp.sub, fp.mul, fp.div RM x Feo X Feo > Feo 


fma fp.fma RM x Fe o X Feo X Feo > Feo 
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we succeeded (or failed) to establish an invertibility condition. Due to space 
restrictions, we omit the conditions for the remaining cases.! 


Table 2. Invertibility conditions for floating-point operators (excl. fma) with ~. 


Literal Invertibility condition 


R RTP R RTN R 
et+set ta(t—s)+sVte(t—s)+sVsxt 
R RTP R RTN R 
e-sxt ta(s+t)—sVtx(s+t)—sV(s#tA\sxtoAtx+too) 
R R RTP R RTN 
s—-rat tes+(t—s)Vtes+(t—s)Vsxt 


RTP pR RTN R 
x-sxt  ta(t+s)-sVta(t=s)-sV (sxta o00)V(sx0Atx 0) 
R RTP R RTN 
sr+sat t(s- t)+sVta(s- 
R R RTP 


? R RTN 
start tas=+(s+t)Vtxs= (s= t)V (sxo Atx o0)V (se t0Ate +0) 


xremsxt txtrems 


R 
t) + s V (sœ o0 A ta 0) V (te too ^ sx 0) 


sremzrxt ? 


Yaxt taeV(t thvtr V(t t)Vt~+0 
jal et wisNeg(t) 
-rxt aj: 


R 
rti(x)xt tarti(t) 


Table2 lists the invertibility conditions for equality with the operators 
{+,—, =+, rem, WA |...],-,rti}, parameterized over a rounding mode R (one of 
RNE, RNA, RTP, RTN, or RTZ). Note that operators {+,-} and the multiplica- 
tive step of fma are commutative, and thus the invertibility conditions for both 
variants are identical. 

Each of the first six invertibility conditions in this table follows a pattern. The 
first two disjuncts are instances of the literal to solve for, where a term involving 
rounding modes RTP and RTN is substituted for x. These disjuncts are then 
followed by disjuncts for handling special cases for infinity and zero. From the 
structure of these conditions, e.g., for +, we can derive the insight that if there 


R 
is a solution for x in the equation z + st and we are not in a corner case where 
RTP RTN 
s = t, then either t— s or t— s must be a solution. Based on extensive runs of our 
syntax-guided synthesis procedure, we believe this condition is close to having 
minimal term size. From this, we conclude that an efficient yet complete method 


for solving x J st checks whether t — s rounding towards positive or negative 
is a solution in the non-trivial case when s and ¢ are disequal, and otherwise 
concludes that no solution exists. A similar insight can be derived for the other 
invertibility conditions of this form. 


' Available at https://cvc4.cs.stanford.edu/papers/CAV2019-FP. 
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R 
We found that t is a conditional inverse for the case of rti(x)t and 
xrems#t, that is, substituting t for x is an invertibility condition. For the 
latter, we discovered an alternative invertibility condition: 


RTP RTN : 
lEt t| < |s| V IEF tl < |s| V ite(t = +0, s Z £0, t Z too) (1) 


In contrast to the condition from Table2, this version does not involve rem. 
It follows that certain applications of floating-point remainder, including those 
whose first argument is an unconstrained variable, can be eliminated based on 
this equivalence. Interestingly, for s rem zat, we did not succeed in finding an 
invertibility condition. This case appears to not admit a concise solution; we 
discuss further details below. 

Table 3 gives the invertibility conditions for >. Since these constraints admit 
more solutions, they typically have simpler invertibility conditions. In particular, 
with the exception of rem, all conditions only involve floating-point classifiers. 

When considering literals with predicates, the invertibility conditions for 
cases involving x + s and s — x are identical for every predicate and rounding 
mode. This is due to the fact that s — x is equivalent to s + (—a), indepen- 
dent from the rounding mode. Thus, the negation of the inverse value of x for 
an equation involving x + s is the inverse value of x for an equation involving 
s — x. Similarly, the invertibility conditions for x + s and s + x over predicates 
{<, <, >, >, islnf, isNaN, isNeg, isZero} are identical for all rounding modes. 

For all predicates except {%,isNorm, isSub}, the invertibility conditions for 
operators {+,—,+,-} contain floating-point classifiers only. All of these condi- 
tions are also independent from the rounding mode. Similarly, for operator fma 
over predicates {islnf,isNaN,isNeg,isPos}, the invertibility conditions contain 


Table 3. Invertibility conditions for floating-point operators (excl. fma) with >. 


Literal Invertibility condition 


R 
zx+s>t isPos(s) V ite(s ~ too, (tœ +00 A isNeg(t)), isNeg(s))) A t% NaN 


x 2 s>t ite(isNeg(s), t% NaN, ite(s % oo, (too A^ isNeg(t)), (isPos(s) A t% NaN))) 


ges >t isPos(s) V ite(s ~ too, (tœ +00 A^ isNeg(t)), isNeg(s))) A t% NaN 
a s>t isNeg(t) V tœ +0 V s% +0) A sæ NaN ^ t% NaN 
tissit isNeg(t) V tœ +0 V s% +00) A sæ NaN At% NaN 
sŻr>t isNeg(t) V tœ +0 V s% +0) A sæ NaN ^ t% NaN 
xrems >t ite(isNeg(t), s% NaN, Ea < |s| At#+00)) ^A s%+40 
srema>t ? 

Yr>t t% NaN 

|z| >t t NaN 

-xr >t tÆ NaN 

rti(z) >t t%NaN 
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only floating-point classifiers. All of these conditions except for isNeg(fma(z, s, t)) 
and isPos(fma(a,s,t)) are also independent from the rounding mode. 

For all floating-point operators with predicate isNaN, the invertibility condi- 
tion is T, i.e., an inverse value for x always exists. This is due to the fact that 
every floating-point operator returns NaN if one of its operands is NaN, hence 
NaN can be picked as an inverse value of x. Conversely, we identified four cases 
for which the invertibility condition is L, i.e., an inverse value for x never exists. 
These four cases are isNeg(|2|), isInf (x rem s), isInf (s rem x), and isSub(rti(x)). For 
the first three cases, it is obvious why no inverse value exists. The intuition for 
isSub(rti(z)) is that integers are not subnormal, and as a result if x is rounded to 
an integer it can never be a subnormal number. All of these cases can be easily 
implemented as rewrite rules in an SMT solver. 

For operator fma, the invertibility conditions over predicates {islnf,isNaN, 
isNeg, isPos} contain floating-point classifiers only. For predicate isZero, the 
invertibility conditions are more involved. Equations (2) and (3) show the invert- 
ibility conditions for isZero(fma(x, s,t)) and isZero(fma(s,t,z)) for all rounding 
modes R. 


R RTP 


R 
fma(—(t = s), s, t) £0 V fma(—(t +s), s,t)©+40V(s2+t0At~+0) (2) 


RTP 


R R RTN 
fma(s,t,—(s - t)) #40 V fma(s, t,—(s - t)) x0 (3) 


These two invertibility conditions contain case splits similar to those in Table 2 and 
RTP R RTP 
indicate that, e.g., —t + s is an inverse value for x when fma(—(t + s), s, t) ~ +0 


holds. 

As we will describe in Sect.4, an important aspect of synthesizing these 
invertibility conditions was considering their visualizations. This helped us deter- 
mine which invertibility conditions were relatively simple and which exhibited 
complex behavior. 


(aja+sat (b)a-sat (c)atsat (d)s+axrt 


Fig. 1. Invertibility conditions for {+,-,+} over œ for F3, and rounding mode RNE. 
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(a) rrems ~t (b) srema xt 


Fig. 2. Invertibility conditions for rem over % for F3,5. 


Figure 1 shows the visualizations of the invertibility conditions for operators 
{+,:,+} over ~% from Table 2 for sort F3, with rounding mode RNE (each of the 
literals is two-dimensional). We use 227 x 227 pixel maps over all possible values 
of s and t, where the pixel at point (s,t) is white if the invertibility condition is 
true, and black if it is false. The values of s are plotted on the horizontal axis 
and the values of t are plotted on the vertical axis. The leftmost two columns 
(resp. topmost two rows) give the value of the invertibility condition for s = +0 
(resp. t = +0); the rightmost column (resp. bottom row) gives its value for NaN; 
the next two columns left of (resp. next two rows on top of) NaN give its value 
for +00; the remainder plots the values of the subnormal and normal values of 
s and ¢, left-to-right (resp. top-to-bottom) in increasing order of their absolute 
value, alternating between positive and negative values. These visualizations give 
an intuition of the complexity of the behavior of invertibility conditions, which 
is a consequence of the complex semantics of floating-point operations. 

Figure2 gives the invertibility condition visualizations for remainder over 
~ with sort F3 and rounding mode RNE. The visualization on the left hand 
shows that solving for x as the first argument is relatively easy. It suggests that 
an invertibility condition for this case involves a linear inequality relating the 
absolute values of s and t, which we were able to derive in Eq. (1). Solving for x 
as the second argument, on the other hand, is much more difficult, as indicated 
by the right picture, which has a significantly more complex structure. We con- 
jecture that no simple solution exists for the latter problem. The visualization of 
the invertibility condition gives some of the intuition for this: the diagonal divide 
is caused by the fact that output t will always have a smaller absolute value than 
the input s. The top-left corner represents subnormal/subnormal computation, 
this acts as fixed-point and behaves differently from the rest of the function. 
The stepped blocks along the diagonal occur when s and t have the same expo- 
nent and thus the pattern is similar to the invertibility condition for + shown in 
Fig. 1. Portions right of the main diagonal appear to exhibit random behavior. 


2 Notice that we consider all possible (277! —1)*2 NaN values of Trp as one single NaN 
value. Thus, for sort Fs, we have 227 floating-point values (instead of 2° = 256). 
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: 


S S 
i — 


(a) rrems >t (b)xrems >t (c)sremz >t (d) srema >t 


Fig. 3. Invertibility conditions for rem over inequalities for F3,5. 


(a) fma(x,s,t)*%+0 (b) fma(s,t,2) +0 (c) isSub(fma(x, s,t)) (d) isSub(fma(s, t, x)) 


Fig. 4. Invertibility conditions for fma over {isZero, isSub} for F3,5 and rnd. mode RNE. 


We believe this is the result of repeated cancellations in the computation of the 
remainder for those values, which suggests a behavior that we believe is similar 
to the Blum-Blum-Shub random number generator [9]. 

For remainder with inequalities, we succeeded in determining invertibility 
conditions for < and > if x is the first argument. However, for xrems over 
{<,>}, and s rem z over {>, <, <, >} we did not. This is particularly surprising 
considering that the invertibility conditions for non-strict and strict inequalities 
are nearly identical (varying only by a handful of pixels), as shown in Fig. 3. 
Note that for x as the first argument, all variations of the concise invertibility 
conditions for non-strict inequality we considered failed as solutions for the strict 
inequality. This behavior is representative of the many subtle corner cases we 
encountered while synthesizing these conditions. 

Figure 4 shows visualizations for invertibility conditions involving fma. The left 
two images are visualizations for the invertibility conditions for isZero. The corre- 
sponding invertibility conditions are given in Eqs. (2) and (3) above. We were not 
able to determine invertibility conditions for operator fma over predicate isSub, 
which are visualized in the rightmost two pictures in Fig. 4. Finally, we did not 
succeed in finding invertibility conditions for fma with binary predicates, which 
are particularly challenging since they are three-dimensional. Finding solutions for 
these cases is ongoing work (see Sect. 4 for a more in-depth discussion). 


Invertibility Conditions for Floating-Point Formulas 125 


4 Synthesis of Floating-Point Invertibility Conditions 


Deriving invertibility conditions in Trp is a highly challenging task. We were 
unable to derive these conditions manually despite our substantial background 
knowledge of floating-point numbers. As a consequence, we developed a custom 
extension of the syntax-guided synthesis (SyGuS) paradigm [1] with the goal of 
finding invertibility conditions automatically, which resulted in the conditions 
from Sect.3. While the extension was optimized for this task, we stress that 
our techniques are theory-agnostic and can be used for synthesis problems over 
any finite domain. Our approach builds upon the SyGuS capabilities of the SMT 
solver CVC4 [5, 29], which has recently been extended to support reasoning about 
the theory of floating-points [11]. We use the invertibility condition for floating- 
point addition with equality here as a running example. 

Establishing an invertibility condition requires solving a synthesis problem 
with three levels of quantifier alternation. In particular, for floating-point addi- 
tion with equality, we are interested in finding a solution for predicate IC that 
satisfies the conjecture: 


JIC. Vs, t. (IC(s, t) & (x.x + s ~ t)) (4) 


for some rounding mode R. In other words, this conjecture states that IC(s, t) 
holds exactly when there exists an x that, when rounding the result of adding x 
to s according to mode R, yields t. Furthermore, we are interested in finding a 
solution for IC that holds independently of the format of x, s,t. Note that SMT 
solvers are not capable of reasoning about constraints that are parametric in the 
floating-point format. To address this challenge, following the methodology from 
previous work [26], our strategy for establishing (general) invertibility conditions 
first solves the synthesis conjecture for a fixed format F; o, and subsequently 
checks whether that solution also holds for other formats. The choice of the 
number of exponent bits £ and significand bits ø in Fe, balances two criteria: 


1. £, ø should be large enough to exercise many (or all) of the behaviors of the 
operators and relations in our synthesis conjecture, 
2. £, o should be small enough for the synthesis problem to be tractable. 


In our experience, the best choices for (€,a) depended on the particular invert- 
ibility condition we were solving. The most common choices for (£, a) were (3,5), 
(4,5) and (4,6). For most two-dimensional invertibility conditions (those that 
involve two variables s and t), we used (3,5), since the required synthesis pro- 
cedures mentioned below were roughly eight times faster than for (4,5). For 
one-dimensional invertibility conditions, we often used higher precision formats. 
Since floating-point operators like addition take as additional argument a round- 
ing mode R, we assumed a fixed rounding mode when solving, and then cross- 
checked our solution for multiple rounding modes. 
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Assume we have chosen to synthesize the invertibility condition for conjec- 
ture (4) for format F3,5 and rounding mode RNE. Notice that current SyGuS 
solvers [2,29] support only two levels of quantifier alternation. However, we can 
expand the innermost quantifier in this conjecture to obtain the conjecture: 


F 226 e 
AIC. Yst. (IC(s,t) & (Y i+ s ~ t)) (5) 
i=0 
where for simplicity of notation we use i = 0,...,226 to denote the values of 


F3,5. This methodology was also used in Niemetz et al. [26], where invertibility 
conditions for bit-vector operators were synthesized for bit-width 4 by giving 
the conjecture of the above form to an off-the-shelf SyGuS solver. In contrast 
to that work, we found that the synthesis conjecture above is too challenging 
to be solved efficiently by current state-of-the-art enumerative SyGuS solvers. 
The reason for this is twofold. First, the smallest viable floating-point format is 
3+5 = 8 bits, which requires the body of (5) to have a significantly large number 
of disjuncts (227), which is more than ten times larger than the 16 disjuncts 
required when synthesizing 4-bit invertibility conditions for bit-vectors. Second, 
floating-point formulas are much harder to solve than bit-vector formulas, due to 
the complexity of their bit-blasted encodings. Thus, a significantly challenging 
satisfiability query must be solved for each candidate considered within the 
SyGuS solver. 

To address the above challenges, we perform a more extreme preprocessing 
step on our synthesis conjecture, which computes the input/output behavior of 
the invertibility condition on all points in the domain of s and t. In other words, 
we rephrase our synthesis conjecture as: 


226 226 


IC. A AUCE j) e ciy) (6) 


i=0 j=0 


where each c; j is a Boolean constant (either T or L) determined by a quantifier- 
free satisfiability query. In particular, for each pair of floating-point values (7, j), 
constant c; j is T if £+i ~ 7 is satisfiable, and L if it is unsatisfiable. In practice, 
we represent the above conjecture as a 227 x 227 table, which we call the full 
I/O specification of invertibility condition IC. In our experiments, computing 
this table for most two-dimensional invertibility conditions of sort F3 5 required 
15 min (for 227 x 227 = 51,529 quantifier-free queries), and 2 h for sort Fy 5 
(requiring 483 x 483 = 233,289 queries). This process was accelerated by first 
applying random sampling over possible values of x to quickly test if a query was 
satisfiable. For some operators, notably remainder, this required significantly 
more time than for others (up to a factor of 2). Due to the high cost of this 
preprocessing step, we generated a database with the full I/O specifications for 
all invertibility conditions from Sect.3 using a cluster of 50 nodes with Intel 
Xeon E5-2637 with 3.5GHz and 32GB memory, and then shared this database 
among multiple developers. Computing the full I/O specifications for F3.5, Fas, 
and F4, required a total of 459 days of CPU time (6.1 for F3,5, 54.7 for F45, and 
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398.5 for F4,6). Despite the heavy cost of this step, it was crucial for accelerating 
our framework for synthesizing invertibility conditions, described next. 


1 
ee OVE! TO Candidate 
Solver 
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cex-guided sampling 
Samples < 
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Full 1/0 Spec 
User IC Problem > 


Fig. 5. Architecture for synthesizing invertibility conditions for floating point formulas. 


Figure 5 summarizes our architecture for solving synthesis conjectures of the 
above form. The user first selects an invertibility condition problem to solve, 
where we assume the full I/O specification has been computed using the afore- 
mentioned techniques. At a high level, our architecture can be seen as an inter- 
active synthesis environment, where the user manages the interaction between 
two subprocedures: 


1. a SyGuS solver with support for decision tree learning, and 
2. asolution verifier storing the full I/O specification of the invertibility condition. 


We use a counterexample-guided loop, where the SyGuS solver provides the 
solution verifier with candidate solutions, and the solution verifier provides the 
SyGuS solver with an evolving subset of sample points taken from the full I/O 
specification. These points correspond to counterexamples to failed candidate 
solutions, and are sampled in a uniformly random manner over the domain of 
our specification. To accelerate the speed at which our framework converges on a 
solution, we configure the solution verifier to generate multiple counterexample 
points (typically 10) for each iteration of the loop. The process terminates when 
the SyGuS solver generates a candidate solution that is correct for all points 
according to its full I/O specification. 

We give the user control over both the solutions and counterexample points 
generated in this loop. First, as is commonly done in syntax-guided synthesis 
applications, the user in our workflow provides an input grammar to the SyGuS 
solver. This is a context-free grammar in a standard format [28], which contains 
a guess of the operators and patterns that may be involved in the invertibility 
condition we are synthesizing. Second, note that the domain of floating-point 
numbers can be subdivided into a number of subdomains and special cases (e.g. 
normal, subnormal, not-a-number, infinity), as well as split into different clas- 
sifications (e.g. positive and negative). Our workflow allows the user to provide 
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a side condition, whose purpose is to focus on finding an invertibility condition 
that is correct for one of these subdomains. The side condition acts as a filter- 
ing mechanism on the counterexample points generated by the solution verifier. 
For example, given the side condition isNorm(s) AisNorm(t), the solution verifier 
checks candidate solutions generated by the SyGuS solver only against points 
(s,t) where both arguments are normal, and consequently only communicates 
counterexamples of this form to the SyGuS solver. The solution verifier may 
also be configured to establish that the current candidate solution generated by 
the SyGuS solver is conditionally correct, that is, it is true on all points in the 
domain that satisfy the side condition. 

There are several advantages to the form of the synthesis conjecture in (6) 
that we exploit in our workflow. First, its structure makes it easy to divide the 
problem into sub-cases: our synthesis workflow at all times sends only a subset 
of the conjuncts of (6) for some (i, j) pairs. As a result, we do not burden the 
underlying SyGuS solver with the entire conjecture at once, which would not 
scale in practice. A second advantage is that it is in programming-by-examples 
(PBE) form, since it consists of a conjunction of concrete input-output pairs. 
As a consequence, specialized algorithms can be used by the SyGuS solver to 
generate solutions for (approximations of) our conjecture in a way that is highly 
scalable in practice. These techniques are broadly referred to as decision tree 
learning or unification algorithms. As a brief review (see Alur et al. [2] for a 
recent SyGuS-based approach), a decision tree learning algorithm is given as 
input a set of good examples cı + T,...,Cn + T and a set of bad examples 
dı |> L,...,dm |> L. The goal of a decision tree algorithm is to find a predicate, 
or classifier, that evaluates to true on all the good examples, and false on all 
the bad examples. In our context, a classifier is expressed as an if-then-else tree 
of Boolean sort. Sampling the space of conjecture (6) provides the decision tree 
algorithm with good and bad examples and the returned classifier is a candidate 
solution that we give to the solution verifier. The SyGuS solver of CVC4 uses 
a decision-tree learning algorithm, which we rely on in our workflow. Due to 
the scalability of this algorithm and the fact that only a small subset of our 
conjecture is considered at any given time, candidate solutions are typically 
generated by the SyGuS solver in our framework in a matter of seconds. 

Another important aspect of the SyGuS solver in Fig. 5 is that it is configured 
to generate multiple solutions for the current set of sample points. Due to the 
way the SyGuS-based decision-tree learning algorithm works, these solutions 
tend to become more general over the runtime of the solver. As a simple example 
(assuming exact integer arithmetic), say the solver is given input points (1,1) => 
T, (2,0)  T, (1,0)  L and (0,1)  L for (s, t). It enumerates predicates over 
s and t, starting with simplest predicates first, say s ~ 0, tx 0, sx 1, y 21, 
s +t > 1, and so on. After generating the first four predicates, it constructs 
the solution ite(s ~ 1,t ~ 1,t ~ 0), which is a correct classifier for the given 
set of points. However, after generating the fifth predicate in this list, it returns 
s+t> 1 itself as a solution; this can be seen as a generalization of the previous 
solution since it requires no case splitting. 
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Since more general candidate solutions have a higher likelihood of being 
actual solutions in our experience, our workflow critically relies on the ability of 
users to manually terminate the synthesis procedure when they are satisfied with 
the last generated candidate. Our synthesis procedure logs a list of candidate 
solutions that satisfy the conjecture on the current set of sample points. When 
the user terminates the synthesis process, the solution verifier will check the last 
solution generated in this list. Users have the option to rearrange the elements 
of this list by hand, if they have an intuition that a specific candidate is more 
likely to be correct—and so should be tested first. 


Experience. The first challenging invertibility condition we solved with our 
framework was addition with equality for rounding mode RNE. Initially, we used 
a generic grammar that contained the entire floating-point signature. As a first 


key step towards solving this problem, the synthesis procedure suggested the sin- 
RN RN 
gle literal t= s + (t ~ 5) as candidate solution. Although counterexamples were 


found for this candidate, we noticed that it satisfied over 98% of the specification, 
and a visualization of its I/O behavior showed similar patterns to the invertibil- 
ity condition we were solving for. Based on these observations, we focused our 
grammar towards literals of this form. In particular, we used a function that 
takes two floating-points x,y and two rounding modes R1, R2 as arguments and 


returns at (y—2) as a builtin symbol of our grammar. We refer to such a function 
as a residual computation of y, noting that its value is often approximately y. By 
including various functions for residual computations, we focused the effort of 
the synthesizer on more interesting predicates. The end solution involved multi- 
ple residual computations, as shown in Table 2. Our initial solution was specific 
to the rounding mode RNE. After solving for several other rounding modes, we 
were able to construct a parametric solution that was correct for all rounding 
modes. In total, it took roughly three days of developer time to discover the 
generalized invertibility condition for addition with equality. Many of the sub- 
sequent invertibility conditions took a matter of hours, since by then we had a 
good intuition for the residual computations that were relevant for each case. 

Invertibility conditions involving rem, fma, isNorm, and isSub were challeng- 
ing and required further customizations to the grammar, for instance to include 
constants that corresponded to the minimum and maximum normal and sub- 
normal values. Three-dimensional invertibility conditions (which in this work is 
limited to cases of fma with binary predicates) were especially challenging since 
the domain of their conjecture is a factor of 227 larger for F35 than the others. 
Following our strategy for solving the invertibility conditions for specific formats 
and rounding modes, in ongoing work we are investigating solving these cases 
by first solving the invertibility condition for a fixed value c for one of its free 
variables u. Solving a two-dimensional problem of this form with a solution yp 
may suggest a generalization that works for all values of u where all occurrences 
of c in y are replaced by u. 

We found the side condition feature of our workflow important for narrowing 
down which subdomain was the most challenging for the conjecture in question. 
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For instance, for some cases it was very easy to find invertibility conditions that 
held when both s and t were normal (resp., subnormal), but very difficult when 
s was normal and t was subnormal or vice versa. 

We also implemented a fully automated mode for the synthesis loop in Fig. 5. 
However, in practice, it was more effective to tweak the generated solutions 
manually. The amount of user interaction was not prohibitively high in our 
experience. 

Finally, we found that it was often helpful to visualize the input/output 
behavior of candidate solutions. In many cases, the difference between a candi- 
date solution and the desired behavior of the invertibility condition would reveal 
a required modification to the grammar or would suggest which parts of the 
domain of the conjecture to focus on. 


4.1 Verifying Conditions for Multiple Formats and Rounding 
Modes 


We verified the correctness of all 167 invertibility conditions by checking them 
against their corresponding full I/O specification for floating-point formats F3\5, 
F45, and F4 and all rounding modes, which required 1.6 days of CPU time. This 
is relatively cheap compared to computing the specifications, since checking is 
essentially constant evaluation of invertibility conditions for all possible input 
values. However, this quickly becomes infeasible with increasing precision, since 
the time required for computing the I/O specification roughly increases by a 
factor of 8 for each bit. 

As a consequence, we generated quantified floating-point problems to verify 
the 167 invertibility conditions for formats F3 5, F4 5, Fa, F511 (Float16), Fs 24 
(Float32), and F11,53 (Float64) and all rounding modes. Each problem checks the 
Trp-unsatisfiability of formula =(¢. <= Jz. l[x]), where l[x] corresponds to the 
floating-point literal, and ġe to its invertibility condition. In total, we generated 


QErp (Sz. P(ti,...,t;[z],...,tn)), where x ¢ FV(t;) for i A f: 
If t;[z] = x, return getlC(x, P). 
Otherwise, t;[z] = o(s1,...,5%[Z],...5m) where m > 0, x g FV(s;) fori Æ k. 
Let Q{y] = P(ti, er tj-1,0(s1, sag Skis y Skiz ey Sm), tjie tn) where y is 
a fresh variable. 
Return getIC(y, Q[y]) A QErp (Sz. sx [x] ~ y). 


Fig. 6. Recursive procedure QEfp for computing quantifier elimination for x in the unit 
linear formula da. P(ti,...,¢;[2],...,tn). The free variables in this formula and the 
fresh variable y are implicitly universally quantified. Placeholder © denotes a floating- 
point operator from Table 1. 
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3786 problems (116 * 5 + 51° for each floating-point format) and checked them 
using CVC4 [5] (master 546bf686) and Z3 [16] (version 4.8.4). 

We consider an invertibility condition to be verified for a floating-point format 
and rounding mode if at least one solver reports unsatisfiable. Given a CPU time 
limit of one hour and a memory limit of 8 GB for each solver /benchmark pair, we 
were able to verify 3577 (94.5%) invertibility conditions overall, with 99.2% of 
F3 5, 99.7% of Fa5, 100% of Fig, 93.8% of F511, 90.2% of Fs,24, and 84% of F11,53. 
This verification with CVC4 and Z3 required a total of 32 days of CPU time. 
All verification jobs were run on cluster nodes with Intel Xeon E5-2637 3.5 GHz 
and 32 GB memory. 


5 Quantifier Elimination for Unit Linear Floating-Point 
Formulas 


Based on the invertibility conditions presented in Sect. 3, we can define a quan- 
tifier elimination procedure for a restricted fragment of floating-point formulas. 
The procedure applies to unit linear formulas, that is, formulas of the form 
Jx. P|x] where P is a Xpp-literal containing exactly one occurrence of x. 

Figure 6 gives a quantifier elimination procedure QEpp for unit linear floating- 
point formulas dx. P[x]. We write getIC(y, Q[y]) to indicate the invertibility con- 
dition for y in Q[y], which amounts to a table lookup for the appropriate condi- 
tion as given in Sect. 3. Note that our procedure is currently a partial function 
because we do not have yet invertibility conditions for some unit linear formulas. 
The recursive procedure returns a conjunction of conditions based on the path 
on which x occurs in P. If x occurs beneath multiple nested function applica- 
tions, a fresh variable y is introduced and used for referencing the intermediate 
result of the subterm we are currently solving for. We demonstrate this in the 
following example. 


R 
Example 2. Consider the unit linear formula Jx. (x : u) + s > t. Invoking the 
procedure QE¢p on this input yields, after two recursive calls, the conjunction 


R 
getlC(y1, y1 + s > t) A getlC(yo, y2 "UR yi) A getlC(x, £ © y2) 


where yı and y2 are fresh variables. The third conjunct is trivially equivalent 
to T. This formula is quantifier-free and has the properties specified by the 
following theorem. 
Theorem 1. Let dz.P be a unit linear formula and let T be a model of Trp. 
Then, T satifies sda. P if and only if there exists a model J of Trp (constructible 
from T) that satisfies ~—QEfp (3x. P). 


3 116 invertibility conditions from rounding mode dependent operators and 51 invert- 
ibility conditions where the operator is rounding mode independent (e.g., rem). 
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Niemetz et al. [26] present a similar algorithm for solving unit linear bit-vector 
literals. In that work, a counterexample-guided loop was devised that made 
use of Hilbert-choice expressions for representing quantifier instantiations. In 
contrast to that work, we provide here only a quantifier elimination procedure. 
Extending our techniques to a general quantifier instantiation strategy is the 
subject of ongoing work. We discuss our preliminary work in this direction in 
the next section. 


6 Solving Quantified Floating-Point Formulas 


We implemented a prototype extension of the SMT solver CVC4 that lever- 
ages the results of the previous section to determine the satisfiability of quanti- 
fied floating-point formulas. To handle quantified formulas, CVC4 uses a basic 
model-based instantiation loop (see, e.g., [30,32] for instantiation approaches for 
other theories). This technique maintains a quantifier-free set of constraints F 
corresponding to instantiations of universally quantified formulas. It terminates 
with the response “unsatisfiable” if F is unsatisfiable, and terminates with “sat- 
isfiable” if it can show that the given quantified formulas are satisfied by a model 
of Trp that satisfies F. For Trp, the instantiations are substitutions of univer- 
sally quantified variables to concrete floating-point values, e.g. Va. P(x) = P(0), 
which can be highly inefficient in the worst case for higher precision. 

We extend this basic loop with a preprocessing pass that generates theory 
lemmas based on the invertibility conditions corresponding to literals of quanti- 
fied formulas Vx.P with exactly one occurrence of x, as explained in the example 
below. 


Example 3. Suppose the current set S$ of formulas contains a formula ọ of the 
form Yz. =((x - u) + s > t^ Q(a)) where u, s and t are ground terms; then we 
add the following formula to S where yı and yz are fresh (free) variables: 


(getlC(y1, y1 +s >t) > yi +s >t) A (getlC(yo, y2 ` u ~ y1) > Y2 `U ~ yı) 


The addition of this lemma is satisfiability preserving because, if the invertibility 
condition holds for yı + s > t (resp., y2 -u ~ yı), then yı (resp., y2) a solution 
for that literal. We then add the instantiation lemma y => 7((y2-u) +5 > t^ 
Q(y2)). Although z is not necessarily linear in the body of y, if both invertibility 
conditions hold, then the combination of the above lemmas implies (y2-u)+s > t, 
which together with the instantiation lemma allows the solver to infer that the 
remaining portion of the quantified formula Q cannot hold for y2. An inference 
of this form may be more productive than enumerating the possible values of x 
in instantiations. 


Evaluation. We considered all 61 benchmarks from SMT-LIB [6] that contained 
quantified formulas over floating-points (logic FP), which correspond to verifi- 
cation conditions from the software verification competition that use a floating- 
point encoding [19]. The invertibility conditions required for solving their liter- 
als include floating-point addition, multiplication and division (both arguments) 
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with equality and inequality. We implemented all cases of invertibility conditions 
for solving these cases. We extended our SMT solver CVC4 (GitHub master 
5d248c36) with the above preprocessing pass (GitHub cav19fp 9b5acd74), and 
compared its performance with (configuration CVC4-ext) and without (configu- 
ration CVC4-base) the above preprocessing pass enabled to the SMT solver Z3 
(version 4.8.4). All experiments were run on the same cluster mentioned earlier, 
with a memory limit of 8GB and a 1800s time limit. Overall, CVC4-base solved 
35 benchmarks within the time limit (with no benchmarks uniquely solved com- 
pared to CVC4-ext), CVC4-ext solved 42 benchmarks (7 of these uniquely solved 
compared to the base version), and Z3 solved 56 benchmarks. While CVC4-ext 
solves significantly fewer benchmarks than Z3, we believe that the improvement 
over CVC4-base is indicative that our approach for invertibility conditions shows 
potential for solving quantified floating-point constraints in SMT solvers. A more 
comprehensive evaluation and implementation is left as future work. 


7 Conclusion 


We have presented invertibility conditions for a large subset of combinations of 
floating-point operators over floating-point predicates supported by SMT solvers. 
These conditions were found by a framework that utilizes syntax-guided synthe- 
sis solving, customized for our problem and developed over the course of this 
work. We have shown that invertibility conditions imply that a simple frag- 
ment of quantified floating-points admits compact quantifier elimination, and 
have given preliminary evidence that an SMT solver that partially leverages this 
technique can have a higher success rate on floating-point problems coming from 
a software verification application. 

For future work, we plan to extend techniques for quantified and quantifier- 
free floating-point formulas to incorporate our findings, in particular to lift pre- 
vious quantifier instantiation approaches (e.g., [26]) and local search procedures 
(e.g., [25]) for bit-vectors to floating-points. We also plan to extend and use our 
synthesis framework for related challenging synthesis tasks, such as finding con- 
ditions under which more complex constraints have solutions, including those 
having multiple occurrences of a variable to solve for. Our synthesis framework 
is agnostic to theories and can be used for any sort with a small finite domain. 
It can thus be leveraged also for solutions to quantified bit-vector constraints. 
Finally, we would like to establish formal proofs of correctness of our invertibility 
conditions that are independent of floating-point formats. 
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Abstract. We formulate numerically-robust inductive proof rules for 
unbounded stability and safety properties of continuous dynamical sys- 
tems. These induction rules robustify standard notions of Lyapunov func- 
tions and barrier certificates so that they can tolerate small numerical 
errors. In this way, numerically-driven decision procedures can establish 
a sound and relative-complete proof system for unbounded properties of 
very general nonlinear systems. We demonstrate the effectiveness of the 
proposed rules for rigorously verifying unbounded properties of various 
nonlinear systems, including a challenging powertrain control model. 


1 Introduction 


Infinite-time stability and safety properties of continuous dynamical systems are 
typically established via inductive arguments over continuous time. For instance, 
proving stability of a dynamical system is similar to proving termination of a 
program. A system is stable at the origin in the sense of Lyapunov, if one can 
find a Lyapunov function (essentially a ranking function) that is everywhere pos- 
itive except for reaching exactly zero at the origin, and never increases over time 
along the direction of the system dynamics [11]. Likewise, proving unbounded 
safety of a dynamical system requires one to find a barrier function (or differ- 
ential invariant [19]) that separates the system’s initial state from the unsafe 
regions, and whenever the system states reach the barrier, the system dynam- 
ics always points towards the safe side of the barrier [21]. In both cases, once 
a candidate certificate (Lyapunov or barrier functions) is proposed, the verifi- 
cation problem is reduced to checking the validity of a universally-quantified 
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first-order formula over real-valued variables. The standard approaches for the 
validation step use symbolic quantifier elimination [4] or Sum-of-Squares tech- 
niques [17,18,24]. However, these algorithms are either extremely expensive or 
numerically brittle. Most importantly, they can not handle systems with non- 
polynomial nonlinearity, and thus fall short of a general framework for verifying 
practical systems of significant complexity. 

The standard approach of checking invariance conditions in program anal- 
ysis is to use Satisfiability Modulo Theories (SMT) solvers [16]. However, to 
check the inductive conditions for nonlinear dynamical systems, one has to solve 
nonlinear SMT problems over real numbers, which are highly intractable or 
undecidable [23]. Recent work on numerically-driven decision procedures pro- 
vides a promising direction to bypass this difficulty [5,6]. They have been used 
for many bounded-time verification and synthesis problems for highly nonlinear 
systems [12]. However, the fundamental challenge with using numerically-driven 
methods in inductive proofs is that numerical errors make it impossible to verify 
the induction steps in the standard sense. Take the Lyapunov analysis of stability 
properties as an example. A dynamical system is stable if there exists a func- 
tion that vanishes exactly at the origin and its derivatives strictly decreases over 
time. Since any numerical error blurs the difference between strict and non-strict 
inequality, one can conclude that numerically-driven methods are not suitable 
for verifying these strict constraints. However, proving a system is stable within 
an arbitrarily tiny neighborhood around the origin is all we really need in prac- 
tice. Thus, there is a discrepancy between what the standard theory requires 
and what is needed in practice, or what can be achieved computationally. To 
bridge this gap, we need to rethink about the fundamental definitions. 

In this paper, we formulate new inductive proof rules for continuous dynam- 
ical systems for establishing robust notions of stability and safety. These proof 
rules are practically useful and computationally certifiable in a very general 
sense. For instance, for stability, we define the notion of ¢-stability that requires 
the system to be stable within an e-bounded distance from the origin, instead of 
exactly at the origin. When € is small enough, ¢-stable systems are practically 
indistinguishable from stable systems. We then define the notion of ¢-Lyapunov 
functions that are sufficient for establishing ¢-stability. We then rigorously prove 
that the e-Lyapunov conditions are numerically stable and can be correctly deter- 
mined by d-complete decisions procedures for nonlinear real arithmetic [7]. In this 
way, we can rely on various numerically-driven SMT solvers to establish a sound 
and relative-complete proof systems for unbounded stability and safety prop- 
erties of highly nonlinear dynamical systems. We believe these new definitions 
have eliminated the core difficulty for reasoning about infinite-time properties of 
nonlinear systems, and will pave the way for adapting a wide range of automated 
methods from program analysis to continuous and hybrid systems. In short, the 
paper makes the following contributions: 


— We define ¢-stability and e-Lyapunov functions in Sect.3. We prove that 
finding e-Lyapunov functions is sufficient for establishing e-stability. 

— We define two types of robust proof rules for unbounded safety in Sect. 3, 
which we call Type 1 and Type 2 ¢-barrier functions. The former relies on 
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strict contraction, and the latter relies on reachable-set computation to guar- 
antee bounded escape. 

— We prove that 6-complete decision procedures provide a sound and relative- 
complete proof system for the proposed numerically-robust induction rules, 
in both Sects. 3 and 4. 


We demonstrate the effectiveness of the proposed methods on various nonlinear 
systems in Sect. 5. Section 2 covers the basic definitions and Sect. 6 concludes the 
paper. 

Related Work. Several lines of work have proposed relaxed and practical 
notions to capture the spirit of the stability requirements. Early work from the 
1960s introduced practical stability, which defined bounds on system behaviors 
over finite time horizons [2,14,26,27]. These methods can show whether a sys- 
tem leaves a safe set or enters a goal set over a finite time horizon based on 
Lyapunov-like functions. Stability defined in this sense is equivalent to estimat- 
ing the reachable set over a finite time horizon. Thus, the shortcoming is that 
it may not capture the desired behavior of the system over unbounded time. 
Similarly, notions of boundedness and ultimate boundedness specify limits on 
the system behaviors [11]. Boundedness specifies whether the system remains 
within a given bounded region. Ultimate boundedness specifies that the system 
eventually returns to the given bounded region. These properties can be estab- 
lished based on Lyapunov-like conditions. Related notions have been generalized 
to switched systems [29,30]. Also, the related notion of region stability defines 
systems that eventually enter and remain within a specified set [20]. We present 
stability concepts that unify and extend the above notions. A related relaxation 
of the traditional notions of stability includes almost Lyapunov functions [15], 
which allow the strict stability conditions to be neglected in a region near the 
equilibrium point. The challenge of applying this technique in practice is that 
the size and shape of the neglected region are not specified a priori, so a con- 
structive technique for specifying a stability region is not straightforward. Our 
work is related to efforts to construct and check robust barrier certificates using 
Lyapunov-like functions to ensure that controllers satisfy safety constraints [28]. 
This work provides a framework in which to specify analytic constraints on con- 
troller behaviors. By contrast, our work focuses on providing constraints that 
can be checked fully automatically. Our notion of ¢-barrier functions is closely 
related to t-barrier certificates from [1], though we choose to focus on distance 
bounds from the barrier (£) rather than time bounds that indicate how long it 
takes for behaviors to re-enter the barrier once it has left (t). 


2 Background 


2.1 Dynamical Systems 


Throughout the paper, we use the following definition of an n-dimensional 
autonomous dynamical system: 
da(t) 
dt 


= f(x(t)), x(0) € init and Vt € R.,, x(t) € D, (1) 
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where an open set D C R” is the state space, init C D is a set of initial states, and 
f: D — R” is a vector field specified by Lipschitz-continuous functions on each 
dimension. For notational simplicity, all variable and function symbols can rep- 
resent vectors. When vectors are used in logic formulas, they represent conjunc- 
tions of the formulas for each dimension. For instance, when x = (#1,...,2n), 
we write x = 0 to denote the formula zı = 0A---A £n = 0. For any system 
defined by (1), we write its solution function as 


F: D x Ra —> R”, F(2(0),t) = x(0) +f f(a(s))ds. (2) 


Note that F usually does not have an analytic form. However, since f is Lipschitz- 
continuous, F exists and is unique. We will often use Lie derivatives to measure 
the change of a scalar function along the flow defined by another vector field: 


Definition 1 (Lie Derivative). Let f : D — R” define a vector field. Write 
the i component of f as fi. Let V:D—>R bea differentiable scalar function. 
The Lie derivative of V over f is defined as VV (x) = X; EY Fy. 


2.2 First-Order Language over the Reals Lp, 


We will make extensive use of first-order formulas over real numbers with Type 2 
computable functions [25] to express and infer properties of nonlinear dynamical 
systems. Definition 2 introduces the syntax of these formulas. 


Definition 2 (Syntax of Lr,). Let F be the class of all Type 2 computable 
functions over real numbers. We define: 


t = x; | f(t(x)), where f € F, possibly constant; 
pr= Tl Lite) >0| tx) >0|vAgp| eve] ariy |Yrip. 


We regard ~y as an operation that is defined inductively as usual. For 
instance, =(t > 0) is defined as —t > 0, and —(3x;p) is defined as Vx;7y. For 
any Lr, terms u and v, variable x, and Lp, predicate p, we write sul eu 
and VI“"lzp to denote Ja(u< r Ax <v Ag) and Va((u< r^g < v) > y), 
respectively, which applies to open intervals too. Next, Definition 3 introduces 
syntactic perturbation of formulas in £r,. 


Definition 3 (d-Strengthening and Robust Formulas [7]). Let 6 € Q* be 
arbitrary. Let p be an arbitrary Lr, formula. The 6-strengthening of p, denoted 
by yt, is obtained from ọ by replacing every atomic predicate of the form t(x) > 
0 and t(x) > 0 with t(x) — ô > 0 and t(x) — ô > 0, respectively. We say ọ is ô 
-robust iff pt? © y. 


Definition 4 (d-Complete Decision Procedures [7]). Let S be a class of 
Lp,--sentences. We say a decision procedure is d-complete over S iff for any 
p € S, the procedure correctly returns one of the following answers: 
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— true: ọ ts true. 
- 6-false: yt? is false. 


When the two cases overlap, either decision can be returned. 


It follows that if y is 6-robust, then a -complete decision procedure can 
correctly determine the truth value of . 


3 Robust Proofs for Stability 


We first focus on stability. We will define the notion of ¢-stability, as a relaxation 
of the standard Lyapunov stability, and then define ¢-Lyapunov functions, which 
are sufficient for proving e-stability in a robust way. 


3.1 Stability and Lyapunov Functions 


Conventionally, € and 6 are used to best highlight the connection with ¢-d con- 
ditions for continuity. We will mostly reserve the use of £ for defining conditions 
that are robust under ¢-bounded numerical errors. Thus, we replace € by 7 in 
the standard definitions to avoid confusion. 


Definition 5 (Stability). We say the system in (1) is stable at the origin in 
the sense of Lyapunov, iff for any T-ball neighborhood of the origin, there exists 
a 0-ball around the origin, such that, if the system starts within the 6-ball then it 
never escapes the T-ball. We capture the definition by the following Lr,--formula: 


Stable(f) Sup W600) 300.29) 5? 9) (llr < ô => ||F(20,t)|| < r) 


Definition 6 (Lyapunov Function). Consider a dynamical system given in 
the form of (1), and let V : D —> R be a differentiable function. We say V is a 
non-strict Lyapunov function for the system, iff the following predicate is true: 


LF(f,V) =a (V(0) = 0) A (f(0) = 0) A YPO z (via) >OAV/V(2) < 0) 


Proposition 1. For any dynamical system defined by f, if there exists a Lya- 
punov function V, then the system is stable. Namely, LF(f, V) — Stable( f). 


3.2 Epsilon-Stability 


The standard definitions of stability requires a system to stabilize within arbi- 
trarily small neighborhoods around the origin. However, very small neighbor- 
hoods are practically indistinguishable from the origin. Thus, it is practically 
sufficient to prove that a system is stable within some sufficiently small neigh- 
borhood. We capture this intuition by making a minor change to the standard 
definition, by simply putting a lower bound € on the 7 parameter in Definition 5. 
As a result, the system is required to exhibit the same behavior as standard sta- 
ble systems outside the ¢-ball, but can behave arbitrarily within the e-ball (for 
instance, oscillate around the origin). The formal definition is as follows: 
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R V>a 
VV <0 VV < -y 
a Ar < B = 
f=0-7% 
V=0 : x 
(a) Stability (b) Lyapunov Function (c) e-Stability (d) e-Lyapunov Function 


Fig. 1. Standard and e-relaxed notions of stability and Lyapunov functions 


Definition 7 (Epsilon-Stability). Lete € R, be arbitrary. We say a dynami- 
cal system in (1) is e-stable at the origin in the sense of Lyapunov, iff it satisfies 
the following condition: 


Stable. (f) =af yle+00) 310°) yP zoye) (o| <6 — ||F(z0,t)|| < r) 


In words, for any T > €, there exists ô such that all trajectories that start within 
the 6-ball will stay within a T-ball around the origin. 


Note that the only difference with the standard definition is that 7 is bounded 
from below by a positive £ instead of 0. The definition is depicted in Fig. 1c, which 
shows the difference with the standard notion in Fig. la. Since the only difference 
with the standard definition is the lower bound on the universally quantified 7, 
it is clear that ¢-stability is strictly weaker than standard stability. 


Proposition 2. For any € € R,, Stable(f) — Stable.(f). 


Thus, any system that is stable in the standard definition is also e-stable for 
any € € R,. On the other hand, one can always choose small enough € such 
that an e-stable system is practically indistinguishable from stable systems in 
the standard definition. 


3.3 Epsilon-Lyapunov Function 


We now define the corresponding notion of Lyapunov function that can be used 
for proving e-stability. The robustness problem in the standard definition comes 
from the singularity of the origin. With the relaxed notion of stability, the system 
may oscillate within some ¢-neighborhood of the origin. With the relaxation, we 
now have room for constructing a few nested neighborhoods that can trap the 
trajectories in a way that is robust under sufficiently small perturbations. To 
achieve this, we make use of balls of different sizes, as shown in the following 
definition. We write B. to denote open e-balls around the origin. 


Definition 8 (Epsilon-Lyapunov Functions). Let V : D — R be a differen- 
tiable scalar function defined for the system in (1), and lete € R, be an arbitrary 
value. We say V is an e-Lyapunov function for the system, iff it satisfies the fol- 
lowing conditions: 
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1. Outside the £-ball, there is some positive lower bound on the value of V. 
Namely, there exists a E€ R, such that for any x € D\ Be, V(x) >a. 

2. Inside the e-ball, there is a strictly smaller ¢'-ball in which the value of V 
is bounded from above, to create a gap with its values outside the ©-ball. 
Formally, there exists e' € (0,¢) and B € (0,a) such that for all x € Be, 
V(x) <B. 

3. The Lie derivative of V is strictly negative outside of Ba. Formally, there 
exists y E R, such that for all x € D \ Bv, the Lie derivative of V along f 
satisfies V fV (x) < —7. 


In sum, the three conditions can be expressed with the following Lr,-formula: 


LF- (f, V) =df J02) £9 (0,00) o3 (0.4) 95 (0,00) y 
vP\Bex(V(2) > a) ave (V (2) < 8) 


AYP\Be! x (Viva) < -7) 


It is important to note that <’, a, @, and y, are not fixed constants, but 
existentially quantified variables. Thus the condition can hold true for infinitely 
many values of these parameters, which is critical to robustness. The only free 
variable in the formula is £, used in B, and the bound for e’. Note also that 
neither of LF.-(f,V) and the standard definition LF(f,V) implies the other. 


Remark 1. The logical structure of LF.(f,V) is seemingly more complex than 
the standard Lyapunov conditions in Definition 6 because of the extra existen- 
tial quantification. In Theorem 3, we show that it does not add computational 
complexity in checking the conditions. 


The key result is that the conditions for an e-Lyapunov function are sufficient 
for establishing e-stability. 


Theorem 1. Jf there exists an e-Lyapunov function V for a dynamical system 
defined by f, then the system is e-stable. Namely, LFe( f, V) — Stable.(f). 


Proof. Let + > £ be arbitrary, and let a,y € R,, 8 € (0,q), and e” € (0,¢) be 
as specified by the definition of LF.(f,V). Let xo € Be be an arbitrary point. 
For any t € Ra, let x(t) = F(2o,t) be the system state as defined in (2). We 
use contradiction to prove for any t € R,, inequality ||æ(t)|| < € < T holds. 
Since e’ < € and F(zo,.) is continuous, we know tı and tz with the following 
conditions exists (OB. and OB; are boundaries of the corresponding balls): 


O0<t)<te<t, x(t) € OB, x(t2) € OB, vieutaly (a(t!) € Ba \ Ba) 
We know V(a(ti1)) < B < a < V(a(t2)) and hence V (z(t1)) < V(a(t2)) are both 


true; however, this is in contradiction with the mean value theorem and the fact 
that B: C D and YPBe (V V (£) < —7). 
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Remark 2. Proof of Theorem 1 shows that once state of the system enters Be, 
it never leaves B.. However, it would be still possible for the state to leave Bs. 
One the other hand, since closure of Bs \ Be is bounded, and for every x in this 
area, V is continuous at x and Vf V(x) < —y, no trajectory can be trapped in 
the closure of B: \ Ber. Therefore, even though state of the system might leave 
Ba, it will visit inside of this ball infinitely often. 


Example 1. Consider the time-reversed Van der Pol system given by the follow- 
ing dynamics. Figure3 shows the vector field of this system around the origin. 


tı — — T2 

ial = [etait 
A Lyapunov function z7Pz, where zT is [|z1, £2, 27,2120, 23,23, x720, 
213,23], and P is the 9 x 9 constant matrix given in [8], is a 6-degree poly- 
nomial that can be obtained using simulation-guided techniques from [10]. 
Using dReal [9] with 6 := 107?5 and the Euclidean norm, we are able to prove 
that zT Pz is a 10~!?-Lyapunov function. Table 1 lists the parameters used for 
this proof. 


3.4 Automated Proofs with Delta-Decisions 


We now prove that unlike the conventional conditions, the new inductive proof 
rules are numerically robust. It follows that d-decision procedures provide a 
sound and relative-complete proof system for establishing the conditions in the 
following sense: 


— (Soundness) A 6-complete decision procedure is always correct when it con- 
firms the existence of an ¢-Lyapunov function. 

— (Relative Completeness) For a given ¢-inductive certificate, there exists ô > 0 
such that a 6’-complete procedure is able to verify it, for any 0 < 0’ < ô. 


To prove these properties, the key fact is that the continuity of the functions in 
the induction conditions ensures that there is room for numerical errors in the 
conditions. Consequently, the formulas allow 6-perturbations in their parameters. 
This is captured by Lemma 1, and the proof is given in [8]. 


Lemma 1. For any £ € R,, there exists 6 E€ Q, such that LF-(f,V) is 6-robust. 


Note that if a formula ¢ is d-robust then for every 0’ € (0,6), ¢ is 6’-robust 
as well. The soundness and relative-completeness then follow naturally. 


Theorem 2 (Soundness). If a -complete decision procedure confirms that 
LF.(f,V) is true then V is indeed an e-Lyapunov function, and f is e-stable. 


Proof. Using Definition 4, we know LF.(f,V), exactly as specified in Definition 
8, is true. Therefore, V is e-Lyapunov. Using Theorem 1, f is ¢-stable. 
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Theorem 3 (Relative Completeness). For any £ € R., if LF-(f,V) is true 
then there exists 0 E Q, such that any d-complete decision procedure must return 
that LF.(f,V) is true. 


Proof. Fix an arbitrary € € R, for which LF,(f,V) is true. Let ¢ := LF.(f,V), 
and using Lemma 1, let ô E€ Q, be such that ¢ is 6-robust. Since ¢ is true, we 
conclude ¢*° is true as well. Using Definition 4, no 6-complete decision procedure 
can return 6-false for œ. 


We remark that the quantifier alternation used in Definition 8 can be elim- 
inated without extra search steps. It confirms that we only need to run SMT 
solving to handle the universally quantified subformula. The reason is that the a, 
6, and y parameters can be found by estimating the range of V(x) and V sV (x) 
in the different neighborhoods. In fact, we can rewrite LF.(f,V) in the following 
way to eliminate the use of a, 6, and q: 


LF-(f, V) e abad sup V(x)< inf V(xz)^ sup VyV(a) < 0) 
xeB.1 rE D\B- ce€D\B,1 


Note that in this form the universal quantification is implicit in the sup and inf 
operators. In this way, the formula is existentially quantified on only £’, which 
can then be handled by binary search. This is an efficient way of checking the 
conditions in practice. We also remark that without this method, the original 
formulation with multiple parameters can be directly solved as dV-formulas as 
well using more expensive algorithms [13]. 


4 Robust Proofs for Safety 


In this section, we define two types of e-barrier functions that are robust to 
numerical perturbations. 

Proving unbounded safety requires the use of barrier functions. The idea is 
that if one can find a barrier function that separates initial conditions from the 
set of unsafe states, such that no trajectories can cross the barrier from the safe 
to the unsafe side, then the system is safe. Here we use a formulation similar 
to the that of Prajna [21]. The standard conditions on barrier functions include 
constraints on the vector field of the system at the exact boundary of the barrier 
set, which introduces robustness problems. We show that it is possible to avoid 
these problems using two different formulations, which we call Type 1 and Type 2 
e-barrier functions. Type 1 e-barrier functions strengthen the original definition 
and requires strict contraction of the barrier. Instead of only asking the system to 
be contractive exactly on the barrier’s border, we force it to be contractive when 
reaching any state within a small distance from the border. Type 2 ¢-barrier 
functions allow the system to escape the barrier for a controllable distance and 
a limited period of time. It should then return to the interior of the safe region. 
Type 1 e-barriers can be seen as a subclass of Type 2 e-barriers. The benefit 
for allowing bounded escape is that the shape of the barrier no longer needs 
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to be an invariant set, which can be particularly helpful when the shape of the 
system invariants cannot be determined or expressed symbolically. The down- 
side to Type 2 -barriers is that checking the corresponding conditions requires 
integration of the dynamics, which can be expensive but can still be handled 
by 6-complete decision procedures. The intuition behind the two definitions is 
shown in Fig. 2 and will be explained in detail in this section. 


4.1 Safety and Barrier Functions 


Before formally introducing robust safety and e-barrier functions, we define the 
safety and barrier functions first. It is easy to see that the robustness problem 
with the barrier functions is similar to that of Lyapunov functions: if the bound- 
ary is exactly separating the safe and unsafe regions then the inductive conditions 
are not robust, since deviations in the variables by even a small amount from 
the barrier will make it impossible to complete the proof. 


Definition 9 (Safety). Let B : D — R be a scalar function defined for the 
system in (1). We say B < 0 defines a safe (or forward invariant) set for the 
system, iff the following formula is true: 

Safe(f, init, B) =a VP anol) (init (2x0) > B(F(2o,t)) < 0). 
Definition 10 (Barrier Function). Let B: X — R be a differentiable scalar 


function defined for the system in (1). We say B is a barrier function for the 
system, iff the following formula is true: 


Barrier(f, init, B) =ar va (inita) —> B(x) < 0)A(B(a) =0 > V;B(xr)< 0)) 


Proposition 3. Barrier(f, init, B) — Safe( f, init, B). 


(a) Safety (b) Barrier (c) Type 1 e-Barrier (d) Type 2 e-Barrier 


Fig. 2. Type 1 and Type 2 e-Barriers 
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4.2 Type 1: Strict Contraction 


In the standard definition, the boundary of the barrier set is typically a manifold 
defined by equality, which is not numerically robust. To avoid this problem, we 
need the barrier boundary to be belt-shaped in the sense that there is a clear gap 
between the safe and unsafe regions. The idea is as shown in Fig. 2c: we need a 
second and stronger barrier defined by B = —e for some reasonable £, so that 
the system is clearly separated from B = 0. The formal definition is as follows. 


Definition 11 (¢-Barrier Certificates). Let € € R, be arbitrary. A differ- 
entiable scalar function B : D — R is an €-barrier function iff the following 
conditions are true: 


- For all x, init(x) implies B(x) < —e. 
- There exists y E€ R, such that for all x, B(x) = —e implies V s B(x) < —7. 


Formally, the condition is defined as 


Barriere (f, init, B) =ap VPa(init(2) — B(x) < -e) 


A IOa Ble) = —e > V;B(2) < -7) 


It should be intuitively clear from the definition that the existence of -barrier 
functions is sufficient for establishing invariants and safety properties. The new 
requirement is that the system stays robustly within the barrier, by the area 
defined by —e < B(x) < 0. 


Theorem 4. For any £ € R,, Barrier.(f, init, B) — Safe( f, init, B). 


Proof. Assume Barrier-(f, init, B) is true. It is easy to see Barrier(f, init, B+e), as 
specified in Definition 10, is also true. Therefore, using Proposition 3, we know 
Safe( f, init, B + €) and hence Safe(f, init, B) are both true. 


It is clear that there is room for numerically perturbing the size of the area 
and still obtaining a robust proof. The proof is similar to the one for Lemma 1 
as shown in [8]. 


Theorem 5. For any £ € R,, there exists 6 E€ Q, such that Barrier, (f, init, B) 
is a 0-robust formula. 


Example 2 (Type 1 ¢-Barrier for timed-reversed Van der Pol). Consider the 
time-reversed Van der Pol system introduced in Example 1. We use the same 
example to demonstrate the effect of numerical errors in proving barrier cer- 
tificates. The level sets of the Lyapunov functions in the stable region are bar- 
rier certificates; however, for the barriers that are very close to the limiting 
cycle, numerical sensitivity becomes a problem. In experiments, when £ = 1075 
and 6 = 1074, we can verify that the level set z" Pz = 90, is a Type 1 €- 
barrier. Table 2 lists parameters used in this proof. Figure3 (Left) shows the 
direction field for the timed-reversed Van der Pol dynamics, the border of the 
set zT Pz < 90, which we prove is a type 1 e-barrier, and the boundary of set 
zT Pz < 110, which is clearly not a barrier, since it is outside of the limit cycle. 
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Fig. 3. (Left) Van der pol example (Right) Type 2 barrier example 


The conditions for e-Lyapunov and e-barrier functions look very similar, but 
there is an important difference. In the case of Lyapunov functions, we do not 
evaluate the Lie derivative of the balls. Thus, the balls do not define barrier sets. 
On the other hand, the level sets of Lyapunov functions always define barriers. 


Remark 3. The e-barrier functions can also be used as a sufficient condition for 
e-stability, if a barrier can be found within the ¢-ball required in ¢-stability. 


Remark 4. A technical requirement for proving robustness of the ¢-barrier con- 
ditions is that —init defines a simple set that can be over-approximated, such 
that for every € € R,, there is 6 € R, such that for any point that satisfies 
sinit*® there is an e-close point that satisfies sinit. A sufficient condition for 
this restriction is that init be of the form (A,a; < xi < bi) — p(x), where 
a;,b; E Q are arbitrary constants, and y is a quantifier-free formula with only 
strict inequalities [22]. 


4.3 Type 2: Bounded Escape 


We now introduce the second set of conditions for establishing ¢-invariant sets. 
This set of conditions can be used only when the ¢-variations are considered. This 
notion is inspired by the notion of k-step invariants [3] for discrete-time systems. 
The e-margin that we allow at the boundary of the invariants allows us to exploit 
more techniques. Using reachable set computation, we can directly check if all 
states stay within the barrier set at each step. To ensure that the conditions are 
inductive and useful, we need to impose the following two requirements: 


— (Contraction) Similar to the strengthening in barrier certificates, we require 
that the system does not sit at the boundary: the dynamics at the boundary 
should be contracting. The difference with Type 1 ¢-barriers is that, this 
condition is not imposed through the vector field on the boundary. Instead, 
it is a reachability condition: after some amount of time, all states should 
return to the interior of an appropriate set. 
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— (Bounded Escape) Before reaching back to the invariant set, we allow the 
system to step outside the invariant, but only up to a bounded distance from 
the boundary. 


The intuition is depicted in Fig. 2d. In the formal definition, we parameterize 
the conditions with the time for contraction and the maximum deviation from 
the invariant set, as follows. 


Definition 12 (Type 2 Barrier Functions). Let T, € R, be arbitrary. We 
say a continuous scalar function B defines a (T,¢)-elastic barrier function, iff 
the following conditions hold: 


1. For any z, pie ) implies B(x) < —e. 

2. There exists c' > £ such that any state in B(x) < —e will enter B(x) < —e’ 
after time T. 

3. During time [0,T], the system may step outside of B(x 


) a £ but there exists 
some €* € (0,€] such that all states stay within B(x) < — 


In all, we define the conditions with the following formula 


Barriery.-(f, init, B) Sap wPx(init(x )—> Bia) < -e) 


A 3OleryPavlOTe( (B(x) = —e) > B(F(z,t)) < -e*) 


A (10°) 2'yP a ((B(2) = —e) > B(F(x,T)) < —e') 

Theorem 6, shows that conditions in Definition 12 ensure that the system 
never leaves the invariant B < 0. The key is the second condition: induction 
works because all states come back to the interior of the set defined by B < —e. 
With the third condition only, we cannot perform induction because the set may 
keep growing. 


Theorem 6. For any T,¢ € R,, Barrierr (f, init, B) — Safe( f, init, B). 


Proof. For the purpose of contradiction, suppose starting from xo € init, the 
system is unsafe. Using continuity of the barrier B and the solution function F, 
let t € Ra be a time at which B(x(t)) = 0, where z(t) is by definition F(xọ, t). 
By the 1% property in Definition 12, we know B(ao) < —e < 0. Using continuity 
of B and F, let t’ € [0, t) be the supremum of all times at which B(a(t’)) = —e. 
By the 3°¢ property in Definition 12, we know t—t’ > T, and by the 2”4 property 
in Definition 12, we know B(æ(t' +T)) < —e’ < —e. Using continuity of B and 
F, we know there is a time t” € (t + T,t) at which B(x(t”)) = —e. However, 
this is in contradiction with t being the supremum. 


Theorem 7. For any € € R,, there exists ô € Q, such that Barrierr,.(f, init, B) 
is a 0-robust formula. 
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Example 3. We use this example to show how Type 2 ¢-barriers can be used to 
establish safety. Consider the following system. 


tı = —0.1 —10 Ly 
l= [eS] [2] 

Let init be the set {x | —0.1 < xı < 0.1,—0.1 < z2 < 0.1}, and let U, the unsafe 
set, be the set {x | —2.0 < a, < —1.1,—2.0 < v2 < —1.1}. The system is stable 
and safe with respect to the designated unsafe set. However, the safety cannot 
be shown using any invariant of the form B(x) := £? + x3 — c < 0, where c € Q, 
is a constant, in the standard definition. This is because the vector field on the 
boundary of such sets do not satisfy the inductive conditions. Nevertheless, we 
can show that for c = 1, B(x) is a Type 2 e-barrier. The dReal query verifies the 
conditions with € = 0.1. Since U(x) — B(x) > e and init(x) => B(x) < —e’, we 
know that the system cannot reach any unsafe states. Figure 3 (Right), illustrates 
the example. The green set at the center represents init, and the red set represents 
unsafe set U. The B(x) = 0 level set is not invariant, as evidenced in the figure 
by the forward images at t = 0.14 and t = 0.28 leaving the set; however, as 
the dReal query proves, the reachable set over 0 < t < 10 does not leave the 
B(x) = 1.0 level set and is completely contained in the B(x) = —0.1 level set by 
t = 0.4. Since U(x) > B(x) > 1.0 and init(x) — B(x) < —0.1, then the system 
cannot reach any state in U. 


5 Experiments 


In this section, we show examples of nonlinear systems that can be verified to 
be e-stable or safe with ¢-barriers. 


Table 1. Results for the e-Lyapunov functions. Each Lyapunov function is of the 
form z7 Pz, where z is a vector of monomials over the state variables. We report the 
constant values satisfying the e-Lyapunov conditions, and the time that verification of 
each example takes (in seconds). 


Example a B y E c Time (s) 
T.R. Van der Pol | 2.10 x 107°? | 1.70 x 10773 1077° | 107+? | 5 x 107+? | 0.05 
Norm. Pend. 7.07 x 107°? | 3.97 x 10773 | 107° | 107+? 5 x 107"? | 0.01 
Moore-Greitzer |2.95 x 10719) 2.55 x 10719 | 107°° | 10719 | 5 x 1071" | 0.04 


Table 1 contains parameters we use to verify requirements of Definition 8 
for e-Lyapunov functions in our examples. Table 2 contains parameters we use 
to verify requirements of Definition 11 for Type 1 ¢-barrier functions in our 
examples. The -Lyapunov functions in these examples are of the form V(x) = 
zT Pz, where z is a vector of products of the state variables and P is a constant 
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Table 2. Results for the e-barrier functions. Each barrier function B(x) is of the form 
zT Pz — £, where z is a vector of monomials over x. We indicate the highest degree of 
the monomials used in z, the size of the P, the level Z used for each barrier function, 
and the value of € and y used to the check V B(x) < —7. 


Example £ E y degree (z) | Size of P | Time (s) 
T.R. Van der Pol | 90 1075 |1077 |3 9x9 6.47 
Norm. Pend. [0.1, 10] | 107? | 107? 1 2x2 0.08 
Moore-Greitzer | [1.0, 10] | 107+ | 1071 4 5x5 13.80 
PTC 0.01 1075| 10752 14x14 | 428.75 


matrix obtained using simulation-guided techniques from [10]. All the P matrices 
are given in [8]. 


Time-Reversed Van der Pol. The time-reversed Van der Pol system has been 
used as an example in the previous sections. Figure 3 (Left) shows the direction 
field of this system around the origin. Using dReal with ô := 107°, we are able 
to establish a 10~!*-Lyapunov function and a 10~°-barrier function. 


Normalized Pendulum. A standard pendulum system has continuous dynam- 
ics containing a transcendental function, which causes difficulty for many tech- 
niques. Here, we consider a normalized pendulum system with the follow- 
ing dynamics, in which x; and x2 represent angular position and velocity, 
respectively. In our experiment, using ô = 10750, we can prove that function 
V = xT Px is e-Lyapunov, where e := 107)”. 


H 7 |- m oF ” (3) 


Using 6 := 0.01, we are able to prove that for any value £ € [0.1,10], the function 
B(x) := a7 Px — £, with x being the system state, and P a constant matrix given 
in [8], is a Type 1 0.01-barrier function. 


Moore-Greitzer Jet Engine. Next, we consider a simplified version of the 
Moore-Greitzer model for a jet engine. The system has the following dynamics, 
in which zı and zə are states related to mass flow and pressure rise. 


Be = a (4) 


In our experiment, using ô = 1072 and z := [x?, 2172, £2, £1, £2)”, we can prove 
that function V := zT Pz is e-Lyapunov, where e := 1071, 

Using dReal with ô := 0.1, we are able to prove that for any value £ € [1, 10], 
the function B(x) := zT Pz— £, with x being the system state, z being the vector 
of monomials defined in the previous section, and P a constant matrix given 
in [8], is a Type 1 0.1-barrier function. 
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Powertrain Control System. Next, we consider a closed-loop model of a 
powertrain control (PTC) system for an automotive application. The system 
dynamics consist of four state variables, two associated with a plant and two for 
a controller. The plant models fuel and air dynamics of an internal combustion 
engine and the controller is designed to regulate the air-fuel (A/F) ratio within 
a given range of an optimal value, referred as stoichiometric value. Two states 
related to the plant represent the manifold pressure, p, and the ratio between 
actual A/F ratio and stoichiometric value, r. The two associated with the con- 
troller are the estimated manifold pressure, Pest, and the internal state of the PI 
controller, i. The system is highly nonlinear, with the following dynamics 


2 
p =a |2 2 ( p ) (cs cacop + ¢5c2p” + cec2p) 
11 1 


2 2 
bo =4 C3 + Cacap + C5C2pP° + C6C2p A 
C13 (C3 + C4C2Pest + C5C2P2st + C6C2Pest)(1 +i + c14(r — c16)) 
2 
A = Hi P Pp i 2 2 
Best = C1 | 2i 2 - (2 C13 (C3 + C4C2Pest + C5C2Pest + C6CSPest 
i = cis(r — cs) 


which followed the detailed description of the model and the constant parameter 
values in [10]. We verified that there exists a function of the form B(x) = z7 Pz— 
0.01 (z consist of 14 monomials with a maximum degree of 2), where V s B(x) < 
—y, when B(x) = —e. 


6 Conclusion 


We formulated new inductive proof rules for stability and safety for dynamical 
systems. The rules are numerically robust, making them amenable to verification 
using automated reasoning tools such as those based on 6-decision procedures. 
We presented several examples demonstrating the value of the new approach, 
including safety verification tasks for highly nonlinear systems. The examples 
show that the framework can be used to prove stability and safety for examples 
that were out of reach for existing tools. The new framework relies on the ability 
to generate reasonable candidate Lyapunov functions, which are analogous to 
ranking functions from program analysis. Future work will include improved 
techniques for efficiently generating the ¢-Lyapunov and e-barrier functions and 
related theoretical questions. 
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Abstract. Verified compilers like CompCert and CakeML offer increas- 
ingly sophisticated optimizations. However, their deterministic source 
semantics and strict IEEE 754 compliance prevent the verification of 
“fast-math” style floating-point optimizations. Developers often selec- 
tively use these optimizations in mainstream compilers like GCC and 
LLVM to improve the performance of computations over noisy inputs 
or for heuristics by allowing the compiler to perform intuitive but IEEE 
754-unsound rewrites. 

We designed, formalized, implemented, and verified a compiler for 
Icing, a new language which supports selectively applying fast-math style 
optimizations in a verified compiler. Icing’s semantics provides the first 
formalization of fast-math in a verified compiler. We show how the Icing 
compiler can be connected to the existing verified CakeML compiler and 
verify the end-to-end translation by a sequence of refinement proofs from 
Icing to the translated CakeML. We evaluated Icing by incorporating sev- 
eral of GCC’s fast-math rewrites. While Icing targets CakeML’s source 
language, the techniques we developed are general and could also be 
incorporated in lower-level intermediate representations. 


Keywords: Compiler verification - Floating-point arithmetic - 
Optimization 


1 Introduction 


Verified compilers formally guarantee that compiled machine code behaves 
according to the specification given by the source program’s semantics. This 
stringent requirement makes verifying “end-to-end” compilers for mainstream 
languages challenging, especially when proving sophisticated optimizations that 
developers rely on. Recent verified compilers like CakeML [38] for ML and 
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CompCert [24] for C have been steadily verifying more of these important opti- 
mizations [39-41]. While the gap between verified compilers and mainstream 
alternatives like GCC and LLVM has been shrinking, so-called “fast-math” 
floating-point optimizations remain absent in verified compilers. 

Fast-math optimizations allow a compiler to perform rewrites that are often 
intuitive when interpreted as real-valued identities, but which may not preserve 
strict IEEE 754 floating-point behavior. Developers selectively enable fast-math 
optimizations when implementing heuristics, computations over noisy inputs, 
or error-robust applications like neural networks—typically at the granularity 
of individual source files. The IEEE 754-unsound rewrites used in fast-math 
optimizations allow compilers to perform strength reductions, reorder code to 
enable other optimizations, and remove some error checking [1,2]. Together these 
optimization can provide significant savings and are widely-used in performance- 
critical applications [12]. 

Unfortunately, strict IEEE 754 source semantics prevents proving fast-math 
optimizations correct in verified compilers like CakeML and CompCert. Simple 
strength-reducing rewrites like fusing the expression x * y + z into a faster and 
locally-more-accurate fused multiply-add (fma) instruction cannot be included 
in such verified compilers today. This is because fma avoids an intermediate 
rounding and thus may not produce exactly the same bit-for-bit result as the 
unoptimized code. More sophisticated optimizations like vectorization and loop 
invariant code motion depend on reordering operations to make expressions avail- 
able, but these cannot be verified since floating-point arithmetic is not associa- 
tive. Even simple reductions like rewriting x — x to 0 cannot be verified since 
the result can actually be NaN (“not a number”) if x is NaN. Each of these cases 
represent rewrites that developers would often, in principle, be willing to apply 
manually to improve performance but which can be more conveniently handled 
by the compiler. Verified compilers’ strict IEEE 754 source semantics similarly 
hinders composing their guarantees with recent tools designed to improve accu- 
racy of a source program [14, 16,32], as these tools change program behavior to 
reduce rounding error. In short, developers today are forced to choose between 
verified compilers and useful tools based on floating-point rewrites. 

The crux of the mismatch between verified compilers and fast-math lies in the 
source semantics: verified compilers implement strict IEEE 754 semantics while 
developers are intuitively programming against a looser specification of floating- 
point closer to the reals. Developers currently indicate this perspective by pass- 
ing compiler flags like --ffast-math for the parts of their code written against 
this looser semantics, enabling mainstream compilers to aggressively optimize 
those components. Ideally, verified compilers will eventually support such loos- 
ened semantics by providing an “approximate real” data type and let the devel- 
oper specify error bounds under which the compiler could freely apply any opti- 
mization that stays within bounds. A good interface to tools for analyzing finite- 
precision computations [11, 16] could even allow independently-established formal 
accuracy guarantees to be composed with compiler correctness. 
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As an initial step toward this goal, we present a pragmatic and flexible 
approach to supporting fast-math optimizations in verified compilers. Our app- 
roach follows the implicit design of existing mainstream compilers by provid- 
ing two complementary features. First, our approach provides fine-grained con- 
trol over which parts of a program the compiler may optimize under extended 
floating-point semantics. Second, our approach provides flexible extensions to 
the floating-point semantics specified by a set of high-level rewrites which can 
be specialized to different parts of a program. The result is a new nondeterminis- 
tic source semantics which grants the compiler freedom to optimize floating-point 
code within clearly defined bounds. 

Under such extended semantics, we verify a set of common fast-math opti- 
mizations with the simulation-based proof techniques already used in verified 
compilers like CakeML and CompCert, and integrate our approach with the 
existing compilation pipeline of the CakeML compiler. To enable these proofs, 
we provide various local lemmas that a developer can prove about their rewrites 
to ensure global correctness of the verified fast-math optimizer. Several challenges 
arise in the design of this decomposition including how to handle “duplicating 
rewrites” like distributivity that introduce multiple copies of a subexpression 
and how to connect context-dependent rewrites to other analyses (e.g., from 
accuracy-verification tools) via rewrite preconditions. Our approach thus pro- 
vides a rigorous formalization of the intuitive fast-math semantics developers 
already use, provides an interface for dispatching proof obligations to formal 
numerical analysis tools via rewrite preconditions, and enables bringing fast- 
math optimizations to verified compilers. 

In summary, the contributions of this paper are: 


— We introduce an extensible, nondeterministic semantics for floating-point 
computations which allows for fast-math style compiler optimizations with 
flexible, yet fine-grained control in a language we call Icing. 

— We implement three optimizers based on Icing: a baseline strict optimizer 
which provably preserves IEEE 754 semantics, a greedy optimizer, which 
applies any available optimization, and a conditional optimizer which applies 
an optimization whenever an (optimization-specific) precondition is satisfied. 
The code is available at https: //gitlab.mpi-sws.org/AVA/Icing. 

— We formalize Icing and verify our three different optimizers in HOL4. 

— We connect Icing to CakeML via a translation from Icing to CakeML source 
and verify its correctness via a sequence of refinement proofs. 


2 The Icing Language 


In this section we define the Icing language and its semantics to support fast- 
math style optimizations in a verified compiler. Icing is a prototype language 
whose semantics is designed to be extensible and widely applicable instead of 
focusing on a particular implementation of fast-math optimizations. This allows 
us to provide a stable interface as the implementation of the compiler changes, 
as well as supporting different optimization choices in the semantics, depending 
on the compilation target. 
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2.1 Syntax 


Icing’s syntax is shown in Fig.1. In addition to arithmetic, let-bindings and 
conditionals, Icing supports fma operators, lists ([e1 . . .]), projections (e;[n]), and 
Map and Fold as primitives. Conditional guards consist of boolean constants (b), 
binary comparisons (e; O e2), and an isNaN predicate. isNaN e checks whether e1 
is a so-called Not-a-Number (NaN) special value. Under the IEEE 754 standard, 
undefined operations (e.g., square root of a negative number) produce NaN results, 
and most operations propagate NaN results when passed a NaN argument. It is 
thus common to add checks for NaNs at the source or compiler level. 


w: 64-bit floating-point word x: String neN b € {True, False} 
o € {-, sqrt} o € {+,-,*, /} €{<,<,=} 


€1,€2,e3 := w | x | [er,...] | er[n] | © e1 | e1 o e2 | fma(e1, e2,€3) | opt : (e1) | 
let x = e1 inez | if c thene) else e2 | Map (Az.e1) e2 | Fold (Ax y.e1) e2 €3 


c ::= b | isNaNe | e1 Oez | opt : (c) 


Fig. 1. Syntax of Icing expressions 


We use the Map and Fold primitives to show that Icing can be used to express 
programs beyond arithmetic, while keeping the language simple. Language fea- 
tures like function definitions or general loops do not affect floating-point com- 
putations with respect to fast-math optimizations and are thus orthogonal. 

The opt: scoping annotation implements one of the key features of Icing: 
floating-point semantics are relaxed only for expressions under an opt: scope. In 
this way, opt: provides fine-grained control both for expressions and conditional 
guards. 


2.2 Optimizations as Rewrites 


Fast-math optimizations are typically local and syntactic, i.e., peephole rewrites. 
In Icing, these optimizations are written as s — t to denote finding any subex- 
pression matching pattern s and rewriting it to t, using the substitution from 
matching s to instantiate pattern variables in ¢ as usual. The find and replace 
patterns of a rewrite are terms from the following pattern language which mirrors 
Icing syntax: 


Pı, p2, p3 = w |b| x | op, | pı o p2 | pı Opo | fma (pi, p2, p3) | isNaN py 


Table 1 shows the set of rewrites currently supported in our development. 
While this set does not include all of GCC’s fast-math optimizations, it does 
cover the three primary categories: 


— performance and precision improving strength reduction which fuses x * y+ z 
into an fma instruction (Rewrite 1) 
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— reordering based on real-valued identities, here commutativity, and associa- 
tivity of +, *, double negation and distributivity of x (Rewrites 2-5) 

— simplifying computation based on (assumed) real-valued behavior for compu- 
tations by removing NaN error checks (Rewrite 6) 


A key feature of Icing’s design is that each rewrite can be guarded by a rewrite 
precondition. We distinguish compiler rewrite preconditions as those that must 
be true for the rewrite to be correct with respect to Icing semantics. Removing 
a NaN check, for example, can change the runtime behavior of a floating-point 
program: a previously crashing program may terminate or vice-versa. Thus a 
NaN-check can only removed if the value can never be a NaN. 

In contrast, an application rewrite precondition guards a rewrite that can 
always be proven correct against the Icing semantics, but where a user may still 
want finer-grained control. By restricting the context where Icing may fire these 
rewrites, a user can establish end-to-end properties of their application, e.g., 
worst-case roundoff error. The crucial difference is that the compiler precondi- 
tions must be discharged before the rewrite can be proven correct against the 
Icing semantics, whereas the application precondition is an additional restriction 
limiting where the rewrite is applied for a specific application. 

A key benefit of this design is that rewrite preconditions can serve as an inter- 
face to external tools to determine where optimizations may be conditionally 
applied. This feature enables Icing to address limitations that have prevented 
previous work from proving fast-math optimizations in verified compilers [5] 
since “The only way to exploit these [floating-point] simplifications while pre- 
serving semantics would be to apply them conditionally, based on the results 
of a static analysis (such as FP interval analysis) that can exclude the prob- 
lematic cases.” [5] In our setting, a static analysis tool can be used to establish 
an application rewrite precondition, while compiler rewrite preconditions can be 
discharged during (or potentially after) compilation via static analysis or manual 
proof. 

This design choice essentially decouples the floating-point static analyzer 
from the general-purpose compiler. One motivation is that the compiler may per- 
form hardware-specific rewrites, which source-code-based static analyzers would 
generally not be aware of. Furthermore, integrating end-to-end verification of 
these rewrites into a compiler would require it to always run a global static 
analysis. For this reason, we propose an interface which communicates only the 
necessary information. 

Rewrites which duplicate matched subexpressions, e.g., distributing multi- 
plication over addition, required careful design in Icing. Such rewrites can lead 
to unexpected results if different copies of the duplicated expression are opti- 
mized differently; this also complicates the Icing correctness proof. We show 
how preconditions additionally enabled us to address this challenge in Sect. 4. 
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Icing optimizes code by folding a list of rewrites over a program e: 
rewrite ([],e) =e 
rewrite ((s — t)::rws, e) = 
let e? = if (matches e s) then (app (s — t) e) else e in 
rewrite (rws, e’) 


For rewrite s—>t at the head of rws, rewrite (rws, e) checks if s matches e, 
applies the rewrite if so, and recurses. Function rewrite is used in our optimizers 
in a bottom-up traversal of the AST. Icing users can specify which rewrites may 
be applied under each distinct opt: scope in their code or use a default set 
(shown in Table 1). 


Table 1. Rewrites currently supported in Icing (o € {+, *}) 


Name Rewrite Precondition 
1|fma introduction |x * y + z — fma (x,y,z) application precond. 
2 |o associative (xoy)0z — xo (yoz) application precond. 
3 | o commutative xoy— yox application precond. 
4| double negation |- (- x) — x x well-typed 
5 | * distributive x * (y + z)— (x * y) + (x * z) | no control dependency 
on optimization result 
6 | NaN check removal | isNaNx — false x is not a NaN 


2.3 Semantics of Icing 


Next, we explain the semantics of Icing, highlighting two distinguishing features. 
First, values are represented as trees instead of simple floating-point words, thus 
delaying evaluation of arithmetic expressions. Secondly, rewrites in the semantics 
are applied nondeterministically, thus relaxing floating-point evaluation enough 
to prove fast-math optimizations. 

We define the semantics of Icing programs in Fig. 2 as a big-step judgment of 
the form (cfg, E,e) — v. cfg is a configuration carrying a list of rewrites (s — t) 
representing allowed optimizations, and a flag tracking whether optimizations 
are allowed in the current program fragment under an opt: scope (OptOk). Æ 
is the (runtime) execution environment mapping free variables to values and e 
an Icing expression. The value v is the result of evaluating e under E using 
optimizations from cfg. 

The first key idea of Icing’s semantics is that expressions are not evaluated to 
(64-bit) floating-point words immediately; the semantics rather evaluates them 
into value trees representing their computation result. As an example, if e; evalu- 
ates to value tree vı and es to vg, the semantics returns the value tree represented 
as vı + v2 instead of the result of the floating-point addition of (flattened) vı 
and v2. The syntax of value trees is: 


c= b | isNaN v: | vy O ve | opt: c 
V1, V2, V3 = w | ov, | v1 0 Ve | fma(v1, V2, v3) | opt: vı 
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————S o t a aa DOOL 
(cfg, E,c) ce (efg, E, bt) >b 
(ov, cfg) rewritesTo r E(2)=r 
Unary pe aa a VE 


(cfg, E, oe) +r (cfg, E, £) >r 


(cfg, E, e) > vl (cfg, E, e2) > v2 
n< \ul| (cfg, E, e3) — U3 
vlin] =r (fma vi v2 v3, cfg) rewritesTo r 
Ith f 
(cfg, E, e[n]) ar (cfg, E, fma e1 e2 e3) > r “= 


(cfg, E, e1) > U1 


(cfg, Elz v1], e2) > v2 (cfg with OptOk := true, E, e) >v 
- Let-bind Scope 
(cfg, E, let x = e1 inez) > v2 (cfg, E, Opt: e) >v 
(cfg, E, e1) > vı (cfg, E, c) > cv 
(cfg, E, e2) > v2 cTree2IEEE cu = b 

(v1 o v2, cfg) rewritesTo r (cfg, E, e) >r 

Binar If 
y 
(cfg, E, e 0e2) >r (cfg, E, ifcthenerelseer) >r 


Map [] (cfg, E, s) > v 
ap (cfg, E, Fold (Azy.e) s []) > v Fold [] 


(cfg, E, Map (Az.e) []) > [] 


(cfg, E, e1) > U1 
(cfg, Elz > v1], e) > vres 
(cfg, E, Map (Axe) el) > vl 


Map cons 
(cfg, E, Map (Aw.e) (e1 :: el)) — vres :: ul 
(cfg, E, e1) > U1 
(cfg, E, Fold (Az y.e) sel) > Vres 
(cfg, Ela > v, y => Ures], e) —> Vfinal 
Fold cons 


(cfg, E, Fold (Az y.e) s (e1 :: el)) > Vfnal 


(cfg, E, e1) => vi 


(isNaNv, cfg) rewritesTor = (vı OD v2, cfg) rewritesTo r onoare 
(cfg, E, isNaNe) > r a (cfg, E, e, Oez) ar j 


Fig. 2. Nondeterministic Icing semantics 
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let vi = Map (A x. opt:(x + 3.0)) vi in 
let vsum = Fold (A x y. opt:(x * x + y)) 0.0 vi in sqrt vsum 


Fig. 3. A simple Icing program 


Constants are again defined as floating-point words and form the leaves of value 
trees (variables obtain a constant value from the execution environment £). On 
top of constants, value trees can represent the result of evaluating any floating- 
point operation Icing supports. 

The second key idea of our semantics is that it nondeterministically applies 
rewrites from the configuration cfg while evaluating expression e instead of just 
returning its value tree. In the semantics, we model the nondeterministic choice of 
an optimization result for a particular value tree v with the relation rewritesTo, 
where (cfg, v) rewritesTo r if either the configuration cfg allows for optimizations 
to be applied, and value tree v can be rewritten into value tree r using rewrites 
from the configuration cfg; or the configuration does not allow for rewrites to 
be applied, and v = r. Rewriting on value trees reuses several definitions from 
Sect. 2.2. We add the nondeterminism on top of the existing functions by making 
the relation rewritesTo pick a subset of the rewrites from the configuration cfg 
which are applied to value tree v. 

Icing’s semantics allows optimizations to be applied for arithmetic and com- 
parison operations. The rules Unary, Binary, fma, isNaN, and Compare first 
evaluate argument expressions into value trees. The final result is then nonde- 
terministically chosen from the rewritesTo relation for the obtained value tree 
and the current configuration. Evaluation of Map, Fold, and let-bindings follows 
standard textbook evaluation semantics and does not apply optimizations. 

Rule Scope models the fine-grained control over where optimizations are 
applied in the semantics. We store in the current configuration cfg that opti- 
mizations are allowed in the (sub-)expression e (cfg with OptOk := true). 

Evaluation of a conditional (if c thener elseep) first evaluates the condi- 
tional guard c to a value tree cv. Based on value tree cv the semantics picks a 
branch to continue evaluation in. This eager evaluation for conditionals (in con- 
trast to delaying by leaving them in a value tree) is crucial to enable the later 
simulation proof to connect Icing to CakeML which also eagerly evaluates condi- 
tionals. As the value tree cu represents a delayed evaluation of a boolean value, 
we have to turn it into a boolean constant when selecting the branch to con- 
tinue evaluation in. This is done using the functions cTree2IEEE and tree2IEEE. 
cTree2IEEE (v) computes the boolean value, and tree2IEEE (v) computes the 
floating-point word represented by the value tree v by applying IEEE 754 arith- 
metic operations and structural recursion. 


Example. We illustrate Icing semantics and how optimizations are applied both 
in syntax and semantics with the example in Fig. 3. The example first translates 
the input list by 3.0 using a Map, and then computes the norm of the translated 
list with Fold and sqrt. 
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We want to apply x +y —> y + (commutativity of +) and fma-introduction 
(axy+z— fma(z, y, z)) to our example program. Depending on their order the 
function rewrite will produce different results. 

If we first apply commutativity of +, and then fma introduction, all + oper- 
ations in our example will be commuted, but no fma introduced as the fma intro- 
duction syntactically relies on the expression having the structure x * y+ z where 
x,y,z can be arbitrary. In contrast, if we use the opposite order of rewrites, the 
second line will be replaced by let vsum = Fold (Ax y.fma(x,x,y)) 0.0 v1 and 
commutativity is only applied in the first line. 

To illustrate how the semantics applies optimizations, we run the program 
on the 2D unit vector (vi = [1.0,1.0]) in a configuration that contains both 
rewrites. Consequently the Map application can produce [1.0 + 3.0, 1.0 + 3.0], 
[3.0 + 1.0, 1.0 + 3.0], ... Where the terms 1.0 + 3.0, 3.0 + 1.0 correspond 
to the value trees representing the addition of 1.0 and 3.0. 

If we apply the Fold operation to this list, there are even more possible 
optimization results: 


[(1.0 + 3.0) * (1.0 + 3.0) + (1.0 + 3.0) * (4.0 + 3.0)], 

[(3.0 + 1.0) * (3.0 + 1.0) + (3.0 + 1.0) * (3.0 + 1.0)], 

[fma ((3.0 + 1.0), (3.0 + 1.0), (3.0 + 1.0) * (3.0 + 1.0))], 
[fma ((1.0 + 3.0), (1.0 + 3.0), (3.0 + 1.0) * (1.0 + 3.0))], ... 


The first result is the result of evaluating the initial program without any 
rewrites, the second result corresponds to syntactically optimizing with commu- 
tativity of + and then fma introduction, and the third corresponds to using the 
opposite order syntactically. The last two results can only be results of seman- 
tic optimizations as commutativity and fma introduction are applied to some 
intermediate results of Map, but not all. There is no syntactic application of 
commutativity and fma-introduction leading to such results. 


3 Modelling Existing Compilers in Icing 


Having defined the syntax and semantics of Icing, we next implement and prove 
correct functions which model the behavior of previous verified compilers, like 
CompCert or CakeML, and the behavior of unverified compilers, like GCC or 
Clang, respectively. For the former, we first define a translator of Icing expres- 
sions which preserves the IEEE 754 strict meaning of its input and does not allow 
for any further optimizations. Then we give a greedy optimizer that uncondi- 
tionally optimizes expressions, as observed by GCC and Clang. 


3.1 An IEEE 754 Preserving Translator 


The Icing semantics nondeterministically applies optimizations if they are added 
to the configuration. However, when compiling safety-critical code or after apply- 
ing some syntactic optimizations, one might want to preserve the strict IEEE 
754 meaning of an expression. 
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To make sure that the behavior of an expression cannot be further changed 
and thus the expression exhibits strict IEEE 754 compliant behavior, we have 
implemented the function compileIEEE754, which essentially disallows optimiza- 
tions by replacing all optimizable expressions opt: e?’ with non-optimizable 
expressions e’. Correctness of compileIEEE754 shows that (a) no optimizations 
can be applied after the function has been applied, and (b) evaluation is deter- 
ministic. We have proven these properties as separate theorems. 


3.2 A Greedy Optimizer 


Next, we implement and prove correct an optimizer that mimics the (observed) 
behavior of GCC and Clang as closely as possible. The optimizer applies fma 
introduction, associativity and commutativity greedily. All these rewrites only 
have an application rewrite precondition which we instantiate to True to apply 
the rewrites unconstrained. 

To give an intuition for greedy optimization, recall the example from Fig. 3. 
Greedy optimization does not consider whether applying an optimization is 
beneficial or not. If the optimization is allowed to be applied and it matches 
some subexpression of an optimizable expression, it is applied. Thus the order 
of optimizations matters. Applying the greedy optimizer with the rewrites 
[associativity,fma-introduction, commutativity] to the example, we get: 


let vi = Map (A x. opt:(3.0 + x)) vi in 
let vsum = Fold (A x y. opt:(y + x * x)) 0.0 vi in sqrt vsum 


Only commutativity has been applied as associativity does not match and the 
possibility for an fma-introduction is ruled out by commutativity. If we reverse 
the list of optimizations we obtain: 


let vi = Map (A x. opt:(3.0 + x)) vi in 
let vsum = Fold (A x y. opt: (fma (x,x,y))) 0.0 v1 in sqrt vsum 


which we consider to be a more efficient version of the program from Fig. 3. 

Greedy optimization is implemented in the function optimizeGreedy (rws, e) 
which applies the rewrites in rws in a bottom-up traversal to expression e. In 
combination with the greedy optimizer our fine-grained control (using opt anno- 
tations) allows the end-user to control where optimizations can be applied. 

We have shown correctness of optimizeGreedy with respect to Icing semantics, 
i.e., we have shown that optimizing greedily gives the same result as applying 
the greedy rewrites in the semantics: 


Theorem 1. optimizeGreedy is correct 

Let E be an environment, v a value tree and cfg a configuration. 

If (cfg, E,optimizeGreedy (L[assoctativity, commutativity, fma-intro], e)) —> 
v then (cfg with associativity, commutativity, fma-intro], E,e) > v. 


1 As in many verified compilers, Icing’s proofs closely follow the structure of optimiza- 
tions. Achieving this required careful design and many iterations; we consider the 
simplicity of Icing’s proofs to be a strength of this work. 
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Proving Theorem 1 without any additional lemmas is tedious as it requires 
showing correctness of a single optimization in the presence of other optimiza- 
tions and dealing with the bottom-up traversal applying the optimization at 
the same time. Thus we reduce the proof of Theorem 1 to proving each rewrite 
separately and then chaining together these correctness proofs. Lemma 1 shows 
that applications of the function rewrite can be chained together in the seman- 
tics. This also means that adding, removing, or reordering optimizations simply 
requires changing the list of rewrites, thus making Icing easy to extend. 


Lemma 1. rewrite is compositional 

Let e be an expression, v a value tree, s > t a rewrite, and rws a set of rewrites. 
If the rewrite s — t can be correctly simulated in the semantics, and list rws can 
be correctly simulated in the semantics, then the list of rewrites (s > t)::rws 
can be correctly simulated in the semantics. 


4 A Conditional Optimizer 


We have implemented an IEEE 754 optimizer which has the same behavior as 
CompCert and CakeML, and a greedy optimizer with the (observed) behavior 
of GCC and Clang. The fine-grained control of where optimizations are applied 
is essential for the usability of the greedy optimizer. However, in this section 
we explain that the control provided by the opt annotation is often not enough. 
We show how preconditions can be used to provide additional constraints on 
where rewrites can be applied, and sketch how preconditions serve as an interface 
between the compiler and external tools, which can and should discharge them. 

We observe that in many cases, whether an optimization is acceptable or 
not can be captured with a precondition on the optimization itself, and not on 
every arithmetic operation separately. One example for such an optimization is 
removal of NaN checks as a check for a NaN should only be removed if the check 
never succeeds. 

We argue that both application and compiler rewrite preconditions should 
be discharged by external tools. Many interesting preconditions for a rewrite 
depend on a global analysis. Running a global analysis as part of a compiler 
is infeasible, as maintaining separate analyses for each rewrite is not likely to 
scale. We thus propose to expose an interface to external tools in the form of 
preconditions. 

We implement this idea in the conditional optimizer optimizeCond that sup- 
ports three different applications of fast-math optimizations: applying optimiza- 
tions rws unconstrained (uncond rws), applying optimizations if precondition P 
is true (cond P rws), and applying optimizations under the assumptions genera- 
tion by function A which should be discharged externally (assume A rws). When 
applying cond, optimizeCond checks whether precondition P is true before opti- 
mizing, whereas for assume the propositions returned by A are assumed, and 
should then be discharged separately by a static analysis or a manual proof. 
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Correctness of optimizeCond relates syntactic optimizations to applying opti- 
mizations in the semantics. Similar to optimizeGreedy, we designed the proof 
modularly such that it suffices to prove correct each rewrite individually. 

Our optimizer optimizeCond takes as arguments first a list of rewrite appli- 
cations using uncond, cond, and assume then an expression e. If the list is 
empty, we have optimizeCond ([], e) = e. Otherwise the rewrite is applied 
in a bottom-up traversal to e and optimization continues recursively. For uncond, 
the rewrites are applied if they match; for cond P rws the precondition P is 
checked for the expression being optimized and the rewrites rws are applied if P 
is true; for assume A rws, the function A is evaluated on the expression being opti- 
mized. If execution of A fails, no optimization is applied. Otherwise, A returns a 
list of assumptions which are logged by the compiler and the rewrites are applied. 

Using the interface provided by preconditions, one can prove external theo- 
rems showing additional properties of a compiler run using application rewrite 
preconditions, and external theorems showing how to discharge compiler rewrite 
preconditions with static analysis tools or a manual proof. We will call such 
external theorems meta theorems. 

In the following we discuss two possible meta theorems, highlighting key 
steps required for implementing (and proving) them. A complete implementation 
consists of two connections: (1) from the compiler to rewrite preconditions and 
(2) from rewrite preconditions to external tools. We implement (1) independently 
of any particular tool. A complete implementation of (2) is out of scope of this 
paper; meta theorems generally depend on global analyses which are orthogonal 
to designing Icing, but several external tools already provide functionality that is 
a close match to our interface and we sketch possible connections below. We note 
that for these meta theorems, optimizeCond should track the context in which 
an assumption is made and use the context to express assumptions as local 
program properties. Our current optimizeCond implementation does not collect 
this contextual information yet, as this information at least partially depends 
on the particular meta theorems desired. 


4.1 A Logging Compiler for NaN Special Value Checks 


We show how a meta theorem can be used to discharge a compiler rewrite pre- 
condition on the example of removing a NaN check. Removing a NaN check, in 
general, can be unsound if the check could have succeeded. Inferring statically 
whether a value can be a NaN special value or not requires either a global static 
analysis, or a manual proof on all possible executions. 

Preconditions are our interface to external tools. For NaN check removal, we 
implement a function removeNaNcheck e that returns the assumption that no NaN 
special value can be the result of evaluating the argument expression e. Function 
removeNaNCheck could then be used as part of an assume rule for optimizeCond. 
We prove a strengthened correctness theorem for NaN check removal, showing 
that if the assumption returned by removeNaNcheck is discharged externally (i.e. 
by the end-user or via static analysis), then we can simulate applying NaN check 
removal syntactically in Icing semantics without additional sideconditions. 
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The assumption from removeNaNcheck is additionally returned as the result of 
optimizeCond since it is faithfully assumed when optimizing. Such assumptions 
can be discharged by static analyzers like Verasco [22], or Gappa [17]. 


4.2 Proving Roundoff Error Improvement 


Rewrites like associativity and distributivity change the results of floating-point 
programs. One way of capturing this behavior for a single expression is to com- 
pute the roundoff error, i.e. the difference between an idealized real-valued and 
a floating-point execution of the expression. 

To compute an upper bound on the roundoff error, various formally verified 
tools have been implemented [3,17,30,37]. A possible meta theorem is thus to 
show that applying a particular list of optimizations does not increase the round- 
off error of the optimized expression but only decreases or preserves it. The meta 
theorem for this example would show that (a) all the applied syntactic rewrites 
can be simulated in the semantics and (b) the worst-case roundoff error of the 
optimized expression is smaller or equal to the error of the input expression. Our 
development already proves (a) and we sketch the steps necessary to show (b) 
below. 

We can leverage these roundoff error analysis tools as application precon- 
ditions in a cond rule, checking whether a rewrite should be applied or not 
in optimizeCond. For a particular expression e, an application precondition 
(check (st, e)) would return true if applying rewrite s—t does not increase 
the roundoff error of e. 


Theorem 2. check decreases roundoff error 

(cfg, E, optimizeCond (Cond (Ae. check (st, e))) e) > v => 
(cfg with opts := cfg.opts U {s —> t}, E,e) > uA 

error e < error (optimizeCond (Cond (Ae. check (s—t, e))) e) 


Implementing check (st, e) requires computing a roundoff error for 
expression e and one for e rewritten with st and returning True if and only 
if the roundoff error has not increased by applying the rewrite. Proving the the- 
orem would require giving a real-valued semantics for Icing, connecting Icing’s 
semantics to the semantics of the roundoff error analysis tool, and a global range 
analysis on the Icing programs, which can be provided by Verasco or Gappa. 


4.3 Supporting Distributivity in optimizeCond 


The rewrites considered up to this point do not duplicate any subexpressions 
in the optimized output. In this section, we consider rewrites which do intro- 
duce additional occurrences of subexpressions, which we dub duplicative rewrites. 
Common duplicative rewrites are distributivity of x with + (x * (y+ x) > 
x*y+a* z) and rewriting a single multiplication into multiple additions 
(a *n < So", x). Here we consider distributivity as an example. A compiler 
might want to use this optimization to apply further strength reductions or fma 
introduction. 
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The main issue with duplicative rewrites is that they add new occurrences of 
a matched subexpression. Applying (æ * (y +2) — a*xy+ax*z) toet * (2 + x) 
returns e1 * 2 + e1 * x. The values for the two occurrences of e1 may differ 
because of further optimizations applied to only one of it’s occurrences. 

Any correctness proof for such a duplicative rewrite must match up 
the two (potentially different) executions of e1 in the optimized expres- 
sion (e1 * 2 + e1 * x) with the execution of e1 in the initial expression 
(e1 * (2 + x)). This can only be achieved by finding a common intermedi- 
ate optimization (resp. evaluation) result shared by both subexpressions of 
el * 2+ e1 * x. 

In general, existence of such an intermediate result can only be proven 
for expressions that do not depend on “eager” evaluation, i.e. which consists 
of let-bindings and arithmetic. We illustrate the problem using a conditional 
(if c then e1 else e2). In Icing semantics, the guard c is first evaluated to a 
value tree cv. Next, the semantics evaluates cv to a boolean value b using function 
cTree2IEEE. Computing b from cv loses the structural information of value tree 
cv by computing the results of previously delayed arithmetic operations. This 
loss of information means that rewrites that previously matched the structure 
of cv may no longer apply to b. 

This is not a bug in the Icing semantics. On the contrary, our semantics makes 
this issue explicit, while in other compilers it can lead to unexpected behavior 
(e.g., in GCC’s support for distributivity under fast-math). CakeML, for exam- 
ple, also eagerly evaluates conditionals and similarly loses structural information 
about optimizations that otherwise may have been applied. Having lazy condi- 
tionals in general would only “postpone” the issue until eager evaluation of the 
conditional expression for a loop is necessary. 

An intuitive compiler precondition that enables proving duplicative rewrites 
is to forbid any control dependencies on the expression being optimized. How- 
ever, this approach may be unsatisfactory as it disallows branching on the results 
of optimized expressions and requires a verified dependency analysis that must 
be rerun or incrementally updated after every rewrite, and thus could become 
a bottleneck for fast-math optimizers. Instead, in Icing we restrict duplicative 
rewrites to only fire when pattern variables are matched against program vari- 
ables, e.g., pattern variables a, b,c only match against program variables x, y, z. 
This restriction to only matching let-bound variables is more scalable, as it can 
easily be checked syntactically, and allows us to loosen the restriction on control- 
flow dependence by simply let-binding subexpressions as needed. 


5 Connecting to CakeML 


We have shown how to apply optimizations in Icing and how to use it to preserve 
IEEE 754 semantics. Next, we describe how we connected Icing to an existing 
verified compiler by implementing a translation from Icing source to CakeML 
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non-deterministic Icing with rewriting 
given syntactic rewrites on expression e, 
if vis the result of evaluating e 
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equivalence of exact 
floating-point behaviour 


CakeML source 


Fig. 4. Simulation diagram for Icing and the designed optimizers 


source and showing an equivalence theorem.” The translation function toCML 
maps Icing syntax to CakeML syntax. We highlight the most interesting cases. 
The translations of Ith, Map, Fold relate an Icing execution to a predefined func- 
tion from the CakeML standard library. We show separate theorems relating 
executions of list operations in Icing to CakeML closures of library functions. 
The predicate isNaN e is implemented as toCML e <> toCML e. The predicate is 
true in Icing semantics, if and only if e is a NaN special value. Recall that floating- 
point NaN values are incomparable (even to themselves) and thus we implement 
isNaN with an equality check. 

To show that our translation function toCML correctly translates Icing pro- 
grams into CakeML source, we proved a simulation between the two semantics, 
illustrated in Fig.4. The top part consists of the correctness theorems we have 
shown for the optimizers, relating syntactic optimization to semantic rewrit- 
ing. In the bottom part we relate a deterministic Icing execution which does 
not apply optimizations to CakeML source semantics and prove an equivalence. 
For the backward simulation between CakeML and Icing we require the Icing 
program to be well-typed which is independently checked. 


6 Related Work 


Verified Compilation of Floating-Point Programs. CompCert [25] uses a con- 
structive formalization of IEEE 754 arithmetic [6] based on Flocq [7] which 
allows for verified constant propagation and strength reduction optimizations 
for divisions by powers of 2 and replacing x x 2 by x+ a. The situation is similar 
for CakeML [38] whose floating-point semantics is based on HOL’s [19,20]. With 
Icing, we propose a semantics which allows important floating-point rewrites in 
a verified compiler by allowing users to specify a larger set of possible behaviors 
for their source programs. The precondition mechanism serves as an interface 
to external tools. While Icing is implemented in HOL, our techniques are not 
specific to higher-order logic or the details of CakeML and we believe that an 
analog of our “verified fast-math” approach could easily be ported to CompCert. 

The Alive framework [27] has been extended to verify floating-point peep- 
hole optimizations [29,31]. While these tools relax some exceptional (NaN) cases, 


2 We also extended the CakeML source semantics with an fma operation, as CakeML’s 
compilation currently does not support mapping fma’s to hardware instructions. 
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most optimizations still need to preserve “bit-for-bit” IEEE 754 behavior, which 
precludes valuable rewrites like the fma introductions Icing supports. 


Optimization of Floating-Point Programs. ‘Mixed-precision tuning’ can increase 
performance by decreasing precision at the expense of accuracy, for instance from 
double to single floating-point precision. Current tools [11, 13, 16,35], ensure that 
a user-provided error bound is satisfied either through dynamic or static analysis. 
In this work, we consider only uniform 64-bit floating-point precision, but Icing’s 
optimizations are equally applicable to other precisions. Optimizations such as 
mixed-precision tuning are, however, out of scope of a compiler setting, as they 
require error bound annotations for kernel functions. 

Spiral [33] uses real-valued linear algebra identities for rewriting at the algo- 
rithmic level to choose a layout which provides the best performance for a par- 
ticular platform, but due to operation reordering is not IEEE 754 semantics 
preserving. Herbie [32] optimizes for accuracy, and not for performance by apply- 
ing rewrites which are mostly based on real-valued identities. The optimizations 
performed by Spiral and Herbie go beyond what traditional compilers perform, 
but they fit our view that it is sometimes beneficial to relax the strict IEEE 
754 specification, and could be considered in an extended implementation of 
Icing. On the other hand, STOKE’s floating-point superoptimizer [36] for x86 
binaries does not preserve real-valued semantics, and only provides approximate 
correctness using dynamic analysis. 


Analysis and Verification of Floating-Point Programs. Static analysis for bound- 
ing roundoff errors of finite-precision computations w.r.t. to a real-valued seman- 
tics [15, 17,18, 28,30,37] (some with formal certificates in Coq or HOL), are cur- 
rently limited to short, mostly straight-line functions and require fine-grained 
domain annotations at the function level. Whole program accuracy can be 
formally verified w.r.t. to a real-valued implementation with substantial user 
interaction and expertise [34]. Verification of elementary function implemen- 
tations has also recently been automated, but requires substantial compute 
resources [23]. 

On the other hand, static analyses aiming to verify the absence of run- 
time exceptions like division by zero [4,10,21,22] scale to realistic programs. 
We believe that such tools can be used to satisfy preconditions and thus Icing 
would serve as an interface between the compiler and such specialized verification 
techniques. 

The KLEE symbolic execution engine [9] has support for floating-point pro- 
grams [26] through an interface to Z3’s floating-point theory [8]. This theory is 
also based on IEEE 754 and will thus not be able to verify the kind of optimiza- 
tions that Icing supports. 


7 Conclusion 


We have proposed a novel semantics for IEEE 754-unsound floating-point com- 
piler optimizations which allows them to be applied in a verified compiler setting 
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and which captures the intuitive semantics developers often use today when rea- 
soning about their floating-point code. Our semantics is nondeterministic in order 
to provide the compiler the freedom to apply optimizations where they are useful 
for a particular application and platform—but within clearly defined bounds. The 
semantics is flexible from the developer’s perspective, as it provides fine-grained 
control over which optimizations are available and where in a program they can 
be applied. We have presented a formalization in HOL4, implemented three pro- 
totype optimizers, and connected them to the CakeML verified compiler frontend. 
For our most general optimizer, we have explained how it can be used to obtain 
meta-theorems for its results by exposing a well-defined interface in the form of 
preconditions. We believe that our semantics can be integrated fully with different 
verified compilers in the future, and bridge the gap between compiler optimiza- 
tions and floating-point verification techniques. 
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Abstract. Elementary function calls are a common feature in numeri- 
cal programs. While their implementations in mathematical libraries are 
highly optimized, function evaluation is nonetheless very expensive com- 
pared to plain arithmetic. Full accuracy is, however, not always needed. 
Unlike arithmetic, where the performance difference between for example 
single and double precision floating-point arithmetic is relatively small, 
elementary function calls provide a much richer tradeoff space between 
accuracy and efficiency. Navigating this space is challenging, as guar- 
anteeing the accuracy and choosing correct parameters for good perfor- 
mance of approximations is highly nontrivial. We present a fully auto- 
mated approach and a tool which approximates elementary function calls 
inside small programs while guaranteeing overall user given error bounds. 
Our tool leverages existing techniques for roundoff error computation 
and approximation of individual elementary function calls and provides 
an automated methodology for the exploration of parameter space. Our 
experiments show that significant efficiency improvements are possible 
in exchange for reduced, but guaranteed, accuracy. 


1 Introduction 


Numerical programs face an inherent tradeoff between accuracy and efficiency. 
Choosing a larger finite precision provides higher accuracy, but is generally more 
costly in terms of memory and running time. Not all applications, however, need 
a very high accuracy to work correctly. We would thus like to compute the results 
with only as much accuracy as is needed, in order to save resources. 

Navigating this tradeoff between accuracy and efficiency is challenging. First, 
estimating the accuracy, i.e. bounding roundoff and approximation errors, is non- 
trivial due to the complex nature of finite-precision arithmetic which inevitably 
occurs in numerical programs. Second, the space of possible implementations is 
usually prohibitively large and thus cannot be explored manually. 

Today, users can choose between different automated tools for analyzing 
accuracy of floating-point programs [7,8, 11,14, 18,20,26] as well as for choosing 
between different precisions [5,6,10]. The latter tools perform mixed-precision 
tuning, i.e. they assign different floating-point precisions to different operations, 
© The Author(s) 2019 
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and can thus improve the performance w.r.t. a uniform precision implemen- 
tation. The success of such an optimization is, however, limited to the case 
when uniform precision is just barely not enough to satisfy a given accuracy 
specification. 

Another possible target for performance optimizations are elementary func- 
tions (e.g. sin, exp). Users by default choose single- or double-precision libm 
library function implementations, which are fully specified in the C language 
standard (ISO/IEC 9899:2011) and provide high accuracy. Such implementa- 
tions are, however, expensive. When high accuracy is not needed, we can save 
significant resources by replacing libm calls by coarser approximations, opening 
up a larger, and different tradeoff space than mixed-precision tuning. Unfortu- 
nately, existing automated approaches [1,25] do not provide accuracy guarantees. 

On the other hand, tools like Metalibm [3] approximate individual elementary 
functions by polynomials with rigorous accuracy guarantees given by the user. 
They, however, do not consider entire programs and leave the selection of its 
parameters to the user, limiting its usability mostly to experts. 

We present an approach and a tool which leverages the existing whole-program 
error analysis of Daisy [8] and Metalibm’s elementary function approximation to 
provide both sound whole-program guarantees as well as efficient C implementa- 
tions for floating-point programs with elementary function calls. Given a target 
error specification, our tool automatically distributes the error budget among uni- 
form single or double precision arithmetic operations and elementary functions, 
and selects a suitable polynomial degree for their approximation. 

We have implemented our approach inside the tool Daisy and compare the 
performance of generated programs against programs using libm on examples 
from literature. The benchmarks spend on average 38% and up to 50% of time 
for evaluation of the elementary functions. Our tool improves the overall perfor- 
mance by on average 14% and up to 25% when approximating each elementary 
function call individually, and on average 17% and up to 31% when approximat- 
ing compound function calls. These improvements were achieved solely by opti- 
mizing approximations to elementary functions and illustrate pertinence of our 
approach. These performance improvements incur overall whole-program errors 
which are only 2-3 magnitudes larger than double-precision implementations 
using libm functions and are well below the errors of single-precision implemen- 
tations. Our tool thus allows to effectively trade performance for larger, but 
guaranteed, error bounds. 


Contributions. In summary, in this paper we present: (1) the first approximation 
technique for elementary functions with sound whole-program error guarantees, 
(2) an experimental evaluation on benchmarks from literature, and (3) an imple- 
mentation, which is available at https://github.com/malyzajko/daisy. 


Related Work. Several static analysis tools bound roundoff errors of floating- 
point computations [7,18,20,26], assuming libm implementations, or verify the 
correctness of several functions in Intel’s libm library [17]. Muller [21] provides 
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a good overview of the approximation of elementary functions. Approaches for 
improving the performance of numerical programs include mixed-precision tun- 
ing [5,6,10,16, 24], and autotuning, which performs low-level real-value semantics- 
preserving transformations [23,27]. These leverage a different part of the trade- 
off space than 1ibm approximation and are thus orthogonal. Herbie [22] and Sar- 
dana [7] improve accuracy by rewriting the non-associative finite-precision arith- 
metic, which is complementary to our approach. Approaches which approximate 
entire numerical programs include MCMC search [25], enumerative program syn- 
thesis [1] and neural approximations [13]. Accuracy is only checked on a small set 
of sample inputs and is thus not guaranteed. 


2 Our Approach 


We explain our approach using the following example [28] computing a forward 
kinematics equation and written in Daisy’s real-valued specification language: 


def forwardk2jY(thetal: Real, theta2: Real): Real = { 
require(-3.14 <= thetal && thetal <= 3.14 && -3.14 <= theta2 && theta2 <= 3.14) 
val 11: Real = 0.5; val 12: Real = 2.5 
11 * sin(thetal) + 12 * sin(thetal + theta2) 
} ensuring(res => res +/- le-11) 


Although this program is relatively simple, it still presents an opportunity for 
performance savings, especially when it is called often, e.g. during the motion of 
a robotics arm. Assuming double-precision floating-point arithmetic and library 
implementations for sine, Daisy’s static analysis determines the worst-case abso- 
lute roundoff error of the result to be 3.44e-15. This is clearly a much smaller 
error than what the user requested (1e-11) in the postcondition (ensuring clause). 

The two elementary function calls to sin account for roughly 40.7% of the 
overall running time. We can save some of this running time using polynomial 
approximations, which our tool generates in less than 6 min. The new double pre- 
cision C implementation is roughly 15.6% faster than one with 1libm! functions, 
i.e. using around 40% of the available margin. This is a noteworthy performance 
improvement, considering that we optimized uniquely the evaluation of elemen- 
tary functions. The actual error of the approximate implementation is 1.56e-12, 
i.e. roughly three orders of magnitude higher than the libm error. This error is 
still much smaller than if we had used a uniform single precision implementation, 
which incurs a total error of 1.85e-6. 

We implement our approach inside the Daisy framework [8], combining 
Daisy’s static dataflow analysis for bounding finite-precision roundoff errors, 
Metalibm’s automated generation of efficient polynomial approximations, as 
well as a novel error distribution algorithm. Our tool furthermore automatically 
selects a suitable polynomial degree for approximations to elementary functions. 


1 There are various different implementations of libm that depend on the operating 
system and programming language. Here when referring to libm we mean the GNU 
libc implementation (https://www.gnu.org/software/libc/). 
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Unlike previous work, our tool guarantees that the user-specified error is satis- 
fied. It soundly distributes the overall error budget among arithmetic operations 
and libm calls using Daisy’s static analysis. Metalibm uses the state-of-the art 
minimax polynomial approximation algorithm [2] and Sollya [4] and Gappa [12] 
to bound errors of their implementations. Given a function, a target relative 
error bound and implementation parameters, Metalibm generates C code. Our 
tool does not guarantee to find the most efficient implementation; the search 
space of implementation and approximation choices is highly complex and dis- 
crete, and it is thus infeasible to find the optimal parameters. 

The input to our tool is a straight-line program? with standard arithmetic 
operators (=,—,*, /) as well as the most commonly used elementary functions 
(sin ,cos , tan ,log ,exp, ,/). The user further specifies the domains of all inputs, 
together with a target overall absolute error which must be satisfied. The output 
is C code with arithmetic operations in uniform single or double precision, and 
libm approximations in double precision (Metalibm’s only supported precision). 


Algorithm. We will use ‘program’ for the entire expression, and ‘function’ for 
individual elementary functions. Our approach works in the following steps. 

Step 1 We re-use Daisy’s frontend which parses the input specification. We 
add a pre-processing step, which decomposes the abstract syntax tree (AST) of 
the program we want to approximate such that each elementary function call is 
assigned to a fresh local variable. This transformation eases the later replacement 
of the elementary functions with an approximation. 

Step 2 We use Daisy’s roundoff error analysis on the entire program, assum- 
ing a libm implementation of elementary functions. This analysis computes a 
real-valued range and a worst-case absolute roundoff error bound for each subex- 
pression in the AST, assuming uniform single or double precision as appropriate. 
We use this information in the next step to distribute the error and to determine 
the parameters for Metalibm for each function call. 

Step 3 This is the core step, which calls Metalibm to generate a (piece- 
wise) polynomial approximation for each elementary function which was assigned 
to a local variable. Each call to Metalibm specifies the local target error for 
each function call, the polynomial degree and the domain of the function call 
arguments. To determine the argument domains, we use the range and error 
information obtained in the previous step. Our tool tries different polynomial 
degrees and selects the fastest implementation. We explain our error distribution 
and polynomial selection further below. 

Metalibm generates efficient double-precision C code including argument 
reduction (if applicable), domain splitting, and polynomial approximation with 
a guaranteed error below the specified target error (or returns an error). Met- 
alibm furthermore supports approximations with lookup tables, whose size the 
user can control manually via our tool frontend as well. 


? All existing approaches for analysing floating-point roundoff errors which handle 
loops or conditional branches, reduce the reasoning about errors to straight-line code, 
e.g. through loop invariants [9,14] or loop unrolling [7], or path-wise analysis [7,9,15]. 
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Step 4 Our tool performs roundoff error analysis again, this time taking into 
account the new approximations’ precise error bounds reported by Metalibm. 
Finally, Daisy generates C code for the program itself, as well as all necessary 
headers to link with the approximation generated by Metalibm. 


Error Distribution. In order to call Metalibm, Daisy needs to determine the 
target error for each libm call. Recall that the user of our tool only specifies the 
total error at the end of the program. Hence, distributing the total error budget 
among arithmetic operations and (potentially several) elementary function calls 
is a crucial step. Consider again our running example which has two elementary 
function calls. Our tool distributes the error budget as follows: 


Fe- F@I < |f@) — A(@)+ lle) — fale)| + |fala) — F@| 


where we denote by f the real-valued specification of the program; al and f have 
one and two elementary function calls approximated, respectively, and arithmetic 
is considered exact; and f is the final finite-precision implementation. 

_Daisy first determines the budget for the finite-precision roundoff error 
(|fa(x) — f(@)|) and then distributes the remaining part among libm calls. At 
this point, Daisy cannot compute |f2(2) — f(#)| exactly, as the approximations 
are not available yet. Instead, it assumes 1ibm-based approximations as baseline. 

Then, Daisy distributes the remaining error budget either equally among 
the elementary function calls, or by taking into account that the approximation 
errors are propagated differently through the program. This error propagation 
is estimated by computing the derivative w.r.t. to each elementary function call 
(which gives an estimation of the conditional number). Daisy computes partial 
derivatives symbolically and maximizes them over the specified input domain. 

Finally, we obtain an error budget for each libm call, representing the total 
error due to the elementary function call at the end of the program. For calling 
Metalibm, however, we need the local error at the function call site. Due to error 
propagation, these two errors can differ significantly, and may lead to overall 
errors which exceed the error bound specified by the user. We estimate the error 
propagation using a linear approximation based on derivatives, and use this 
estimate to compute a local target error from the total error budget. 

Since Metalibm usually generates approximations with slightly tighter error 
bounds than asked for, our tool performs a second roundoff analysis (step 4), 
where all errors (smaller or larger) are correctly taken into account. 


Polynomial Degree Selection. The polynomial degree significantly and in a dis- 
crete way influences the efficiency of approximations, so that optimal prediction 
is infeasible. Hence, our tool performs a linear search, using the (coarse) esti- 
mated running time reported by Metalibm (obtained with a few benchmarking 
runs) to select the approximation with the smallest estimated running time. The 
search stops either when the estimated running time is significantly higher than 
the current best, or when Metalibm times out. 

We do not automatically exploit other Metalibm’s parameters, such as min- 
imum subdomain width for splitting, since they give fine-grained control that is 
not suitable for general automatic implementations. 
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3 Experimental Evaluation 


We evaluate our approach in terms of accuracy and performance on bench- 
marks from literature [9,19,28] which include elementary function calls, and 
extend them with the examples rodriguesRotation® and ex2* and ex3_d, which 
are problems from a graduate analysis textbook. While they are relatively short, 
they represent important kernels usually employing several elementary function 
calls*. We base target error bounds on roundoff errors of a 1ibm implementation: 
middle and large errors, each of which is roughly three and four orders of mag- 
nitudes larger than the libm-based bound, respectively. By default, we assume 
double 64 bit precision. 

Our tool provides an automatic generation of benchmarking code for each 
input program. Each benchmarking executable runs the Daisy-generated code 
on 107 random inputs from the input domain and measures performance in the 
number of processor clock cycles. Of the measured number of cycles we discard 
the highest 10%, as we have observed these to be outliers. 


Experimental Results. By default, we approximate individual elementary func- 
tion calls separately, use equal error distribution and allow table-based approxi- 
mations with an 8-bit table index. For large errors we also measure performance 
for: (i) default settings but with the derivative-based errors distribution; (ii) 
default settings but without table usage; (iii) default settings but with com- 
pound calls with depth 1 and depth oo (approximation ‘as much as possible’). 

Table1 shows the performance improvements of approximated code w.r.t. 
libm based implementations of our benchmarks. We compare against 1ibm only, 
as no approximation or synthesis tool provides error guarantees. By removing 
libm calls in initial programs we roughly estimate the elementary function over- 
head (second column) and give an idea for the margin of improvement. Figure 1 
illustrates the overall improvement that we obtain for each benchmark (the 
height of the bars) and the relative distribution of the running time between 
arithmetic (blue) and elementary functions (green), for large errors with default 
settings but approximate compound calls with depth = oo. 

Our tool generates code with significant performance improvements for most 
functions and often reduces the time spent for the evaluation of elementary 
functions by a factor of two. As expected, the improvements are overall better 
for larger errors and vary on average from 10.7% to 13.8% for individual calls 
depending on the settings, and reach 17.1% on average when approximating 
compound calls as much as possible. However, increasing the program target 
error (for equal error distributions Metalibm target error increases linearly with 
it) does not necessarily lead to better performance, e.g. in case of axisRotationY 
and rodriguesRotation. This is the result of discrete decisions concerning the 
approximation degrees and the domain splittings inside Metalibm. 


3 https: //en.wikipedia.org/wiki/Rodrigues27 rotation formula. 

4 Experiments are performed on a Debian Linux 9 Desktop machine with a 3.3 GHz 
Intel i5 processor and 16 GB of RAM. All code for benchmarking is compiled with 
GNUs g++, version 6.3.0, with the -02 flag. 
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Table 1. Performance improvements (in percent) of approximated code w.r.t. a pro- 
gram with libm library function calls. 


precision double single 
elem. func. |middle large errors middle 
benchmark overhead | equal [equal deriv no table depth 1 depth oo| equal 
sinxx10 20.8 7.6 | 7.7 7.7 7.7 7.6 T.T 4.7 
xul 49.3 13.9 | 25.8 18.0 26.6 25.7 27.3 8.1 
xu2 53.6 46 |12.4 13.0 126 12.5 26.0 -1.4 
integrate18257 52.8 15.2 | 19.4 15.1 -4.5 22.4 31.7 2.1 
integStoutemyer 42.1 -1.0 | 6.5 1.4 0.4 4.8 21.9 6.4 
axisRotationX 38.0 17.2 | 17.3 181 17.4 17.6 17.3 -10.5 
axisRotationY 37.9 17.6 | 12.8 21.5 12.9 12.8 12.8 -14.1 
rodriguesRotation 28.9 14.9 | 11.6 13.6 13.8 13.8 13.9 -7.6 
pendulum1 24.4 -4.6 | -2.9 -4.3 -4.2 11.0 11.7 -9.7 
pendulum2 50.3 9.6 |11.4 6.2 -0.8 20.2 20.5 -0.5 
forwardk2jX 43.7 15.1 | 15.4 15.5 15.0 15.0 15.0 -10.2 
forwardk2jY 40.7 10.7 | 15.6 15.6 15.6 15.6 15.6 7.4 
ex2_1 34.6 12.8 | 12.8 12.3 12.3 12.3 12.1 8.4 
ex2_2 34.9 5.9 |148 15.4 15.1 15.0 15.3 3.6 
ex2_3 42.1 23.5 | 24.5 24.5 24.1 24.8 24.3 3.9 
ex2_4 31.8 11.9 | 12.5 125 126 14.3 14.3 7.9 
ex2_5 40.6 22.5 | 24.4 24.5 24.4 24.4 24.3 10.2 
ex2_9 35.0 7.2 | 7.1 7A 7.2 7.0 9.4 -10.1 
ex2_10 41.5 20.6 | 21.7 8.9 20.5 21.3 21.4 8.3 
ex2_11 30.9 -6.8 | -2.3 -4.9 -2.4 -4.8 -2.8 17.9 
ex3_d 39.3 10.3 | 20.9 19.9 -1.1 19.9 20.3 4.9 
average 38.7 10.9 | 13.8 12.5 10.7 14.9 17.1 1.4 


Somewhat surprisingly, we did not observe an advantage of using the 
derivative-based error distribution over the equal one. We suspect that is due to 
the nonlinear nature of Metalibm’s heuristics. 

Table 1 further demonstrates that usage of tables generally improves the per- 
formance. However, the influence of increasing the table size must be studied on a 
case-by-case basis since large tables might lead to memory-bound computations. 

We observe that it is generally beneficial to approximate ‘as much as possi- 
ble’. Indeed, the power of Metalibm lies in generating (piece-wise) polynomial 
approximations of compound expressions, whose behavior might be much sim- 
pler to evaluate than its individual subexpressions. 

Finally, we also considered an implementation where all data and arithmetic 
operations are in single precision apart from the double-precision Metalibm- 
generated code (whose output is accurate only to single precision). We observe 
that slight performance improvements are possible, i.e. Metalibm can compete 
even with single-precision libm-based code, but to achieve performance improve- 
ments comparable to those of double-precision code, we need a single-precision 
code generation from Metalibm. 


Sound Approximation of Programs with Elementary Functions 181 


mmm elem. original 
mm arith. original 

E elem. approximation 
mm arith. approximation 


600 - 


500 - 
: WH TU 


# clock cycles 
w 
s 
8 


Ë 
8 
s 


100 
o 
e > sos > 5 7 yx oA 4 N Mm Y 4 a 8 
x N > S S S § & Y 2 y SN gl a! S 1 
X 00 v 5 5 o 23 2 x Š x x x x x xX N 
ka 4 E id w 5 5 5 V & o oO wo o v v x 
a 2# 5 § § @ F FB & a 
o o a x a S a = = 
5 ~ a a) A 6 
© oa = X Èj a a 2 & 
S P a a 2 
E pa = 
E ¢ Ss 
= e 


Fig. 1. Average performance and standard deviation. For each benchmark, the first 
bar shows the running time of the libm-based implementation and the second one of 
our implementation. Even relatively small overall time improvements are significant 
w.r.t. the time portion we can optimize (in green). Our implementations also have 


significantly smaller standard deviation (black bars). (Color figure online) 


Analysis Time. Analysis time is highly dependent on the number of required 
approximations of elementary functions: each approximation requires a separate 
call to Metalibm whose running time in turn depends on the problem definition. 
Daisy reduces the number of calls to Metalibm by common expression elimina- 
tion which improves the analysis time. Currently, we set the timeout for each 
Metalibm call to 3min, which leads to an overall analysis time which is rea- 
sonable. Overall, our tool takes between 15s and 20min to approximate whole 


programs, with the average running time being 4min 40s per program 


4 Conclusion 

We presented a fully automated approach which improves the performance of 
small numerical kernels at the expense of some accuracy by generating custom 
approximations of elementary functions. Our tool is parametrized by a user-given 
whole-program absolute error bound which is guaranteed to be satisfied by the 
generated code. Experiments illustrate that the tool efficiently uses the available 
margin for improvement and provides significant speedups for double-precision 
implementations. This work provides a solid foundation for future research in the 

areas of automatic approximations of single-precision and multivariate functions. 
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Abstract. We formalize the theory of quantum Hoare logic (QHL) 
[TOPLAS 33(6),19], an extension of Hoare logic for reasoning about 
quantum programs. In particular, we formalize the syntax and semantics 
of quantum programs in Isabelle/HOL, write down the rules of quantum 
Hoare logic, and verify the soundness and completeness of the deduc- 
tion system for partial correctness of quantum programs. As preliminary 
work, we formalize some necessary mathematical background in linear 
algebra, and define tensor products of vectors and matrices on quantum 
variables. As an application, we verify the correctness of Grover’s search 
algorithm. To our best knowledge, this is the first time a Hoare logic for 
quantum programs is formalized in an interactive theorem prover, and 
used to verify the correctness of a nontrivial quantum algorithm. 


1 Introduction 


Due to the rapid progress of quantum technology in the recent years, it is 
predicted that practical quantum computers can be built within 10-15 years. 
Especially during the last 3 years, breakthroughs have been made in quantum 
hardware. Programmable superconductor quantum computers and trapped ion 
quantum computers have been built in universities and companies [1,3,4,6, 23]. 
In another direction, intensive research on quantum programming has been 
conducted in the last decade [16,45,51,53], as surveyed in [27,52]. In particular, 
several quantum programming languages have been defined and their compil- 
ers have been implemented, including Quipper [31], Scaffold [35], QWire [47], 
Microsoft’s LIQUi|) [25] and Q# [57], IBM’s OpenQASM [22], Google’s Cirq 
[30], ProjectQ [56], Chisel-Q [40], Quil [55] and Q|.SJ) [39]. These research allow 
quantum programs to first run on an ideal simulator for testing, and then on 
physical devices [5]. For instance, many small quantum algorithms and proto- 
cols have already been programmed and run on IBM’s simulators and quantum 
computers [1,2]. 
© The Author(s) 2019 


I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 187-207, 2019. 
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Clearly, simulators can only be used for testing. It shows the correctness of 
the program on one or a few inputs, not its correctness under all possible inputs. 
Various theories and tools have been developed to formally reason about quan- 
tum programs for all inputs on a fixed number of qubits. Equivalence checking 
[7,8], termination analysis [38], reachability analysis [64], and invariant gen- 
eration [62] can be used to verify the correctness or termination of quantum 
programs. Unfortunately, the size of quantum programs on which these tools 
are applicable is quite limited. This is because all of these tools still perform 
calculations over the entire state space, which for quantum algorithms has size 
exponential in the number of qubits. For instance, even on the best supercom- 
puters today, simulation of a quantum program is restricted to about 50—60 
qubits. Most model-checking algorithms, which need to perform calculations on 
operators over the state space, are restricted to 25-30 qubits with the current 
computing resources. 

Deductive program verification presents a way to solve this state space explo- 
sion problem. In deductive verification, we do not attempt to execute the pro- 
gram or explore its state space. Rather, we define the semantics of the program 
using precise mathematical language, and use mathematical reasoning to prove 
the correctness of the program. These proofs are checked on a computer (for 
example, in proof assistants such as Coq [15] or Isabelle [44]) to ensure a very 
high level of confidence. 

To apply deductive reasoning to quantum programs, it is necessary to first 
define a precise semantics and proof system. There has already been a lot of work 
along these lines [9,20, 21,61]. A recent result in this direction is quantum Hoare 
logic (QHL) [61]. It extends to sequential quantum programs the Floyd-Hoare- 
Naur inductive assertion method for reasoning about correctness of classical 
programs. QHL is proved to be (relatively) complete for both partial correctness 
and total correctness of quantum programs. 

In this paper, we formalize the theory of quantum Hoare logic in 
Isabelle/HOL, and use it to verify a non-trivial quantum algorithm — Grover’s 
search algorithm!. In more detail, the contributions of this paper are as follows. 


1. We formally prove the main results of quantum Hoare logic in Isabelle/HOL. 
That is, we write down the syntax and semantics of quantum programs, spec- 
ify the basic Hoare triples, and prove the soundness and completeness of the 
resulting deduction system (for partial correctness of quantum programs). To 
our best knowledge, this is the first formalization of a Hoare logic for quantum 
programs in an interactive theorem prover. 

2. As an application of the above formalization, we verify the correctness of 
Grover’s search algorithm. In particular, we prove that the algorithm always 
succeeds on the (infinite) class of inputs where the expected probability of 
success is 1. 

3. As preparation for the above, we extend Isabelle/HOL’s library for linear 
algebra. Based on existing work [13,58], we formalize many further results in 
linear algebra for complex matrices, in particular positivity and the Lowner 


' Available online at https://www.isa-afp.org/entries/QHLProver.html. 
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order. Another significant part of our work is to define the tensor product 
of vectors and matrices, in a way that can be used to extend and combine 
operations on quantum variables in a consistent way. Finally, we implement 
algorithms to automatically prove identities in linear algebra to ease the for- 
malization process. 


The organization of the rest of the paper is as follows. Section 2 gives a brief 
introduction to quantum Hoare logic. Section 3 describes in detail our formal- 
ization of QHL in Isabelle/HOL. Section 4 describes the application to Grover’s 
algorithm. Section 5 discusses automation techniques, and gives some idea about 
the cost of the formalization. Section6 reviews some related work. Finally, we 
conclude in Sect. 7 with a discussion of future directions of work. 

We expect theorem proving techniques will play a crucial role in formal rea- 
soning about quantum computing, as they did for classical computing, and we 
hope this paper will be one of the first steps in its development. 


2 Quantum Hoare Logic 


In this section, we briefly recall the basic concepts and results of quantum Hoare 
logic (QHL). We only introduce the proof system for partial correctness, since 
the one for total correctness is not formalized in our work. In addition, we make 
two simplifications compared to the original work: we consider only variables 
with finite dimension, and we remove the initialization operation. The complete 
version of QHL can be found in [61]. 

In QHL, the number of quantum variables is pre-set before each run of the 
program. Each quantum variable q; has dimension d;. The (pure) state of the 
quantum variable takes value in a complex vector space of dimension d;. The 
overall (pure) state takes value in the tensor product of the vector spaces for the 
variables, which has dimension d = [[d;. The mixed state for variable q; (resp. 
overall) is given by a d; x d; (resp. d x d) matrix satisfying certain conditions 
(making them partial density operators). The notation q is used to denote some 
finite sequence of distinct quantum variables (called a quantum register). We 
denote the vector space corresponding to q by Hg. 

The syntax of quantum programs is given by the following grammar: 


S:: = skip | q := Uq | $1; S2 | measure M[q]| : S | while M[g| =1 do S 
where 


- In g:= Ug, U is a unitary operator on Hz, i.e., U'U = UUt = I, where Ut is 
the conjugate transpose of U. 

- In measure M[q] : 5, M = {Mm} is a quantum measurement on Hg, and 
S = {Sm} gives quantum programs that will be executed after each possible 
outcome of the measurement; 

— In while M|q] = 1 do S, M = { Mo, Mı} is a yes-no measurement on g. 
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Quantum programs can be regarded as quantum extensions of classical while 
programs. The skip statement does nothing, which is the same as in the classi- 
cal case. The unitary transformation changes the state of q according to U. It is 
the counterpart to the assignment operation in classical programming languages. 
The sequential composition is similar to its classical counterpart. The measure- 
ment statement is the quantum generalisation of the classical case statement 
if (Om -bm — Sm) fi. The loop statement is a quantum generalisation of the 
classical loop while b do S. 


(Skip) {P} skip {P} 
(UT) {UŻPU}q := Uq {P} 
(Seq) {P} S: {Q} {Q} S2 {R} 


{P} S1; S2 {R} 
{Pm} Sm {Q} forall m 


i {X Mm’ PinMm} measure M(q|: S {Q} 
(Loop) . {Q} S {Mj PMo + MiQM;} 

{MÌ PMo + MiQMi} while M[{q] = 1 do S {P} 
(Order) PEP {Pys{o}-o ce 


{P} S {Q} 


Fig. 1. Proof system qPD for partial correctness 


Formally, the denotational semantics for quantum programs is defined as a 
super-operator |S] (-), assigning to each quantum program S' a mapping between 
partial density operators. As usual, the denotational semantics is defined by 
induction on the structure of the quantum program: 


[skip] (p) = p. 

[a := Ua] (P) = U plt. 
[S1; S2} (0) = [S2191] ¢0)). 

[(measure Mig] : SI) = En lSm](MmpM},). 

[(while Mfg] = 1 do S)](p) = V7. (while Mfg] = 1 do S)"](p), where V 
stands for the least upper bound of partial density operators according to the 
Lowner partial order C. 


Sr CoN 


The correctness of a quantum program S is expressed by a quantum extension 
of the Hoare triple {P}S{Q}, where the precondition P and the postcondition 
Q are matrices satisfying certain conditions for quantum predicates [24]. The 
semantics for partial correctness is defined as follows: 


F par {P}S{Q} iff tr(Pp) < tr(Q]S](e)) + trio) — tr(S] (e)) 
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for all partial density operators p. Here tr is the trace of a matrix. The semantics 
for total correctness is defined similarly: 


F tot {P}S{Q} iff tr(Pp) < tr(Q[S](p)). 


We note that they become the same when the quantum program S is terminating, 
i.e. tr([S]()) = tr(p) for all partial density operators p. 

The proof system qPD for partial correctness of quantum programs is given 
in Fig. 1. The soundness and (relative) completeness of gPD is proved in [61]: 


Theorem 1. The proof system qPD is sound and (relative) complete for partial 
correctness of quantum programs. 


3 Formalization in Isabelle/HOL 


In this section, we describe the formalization of quantum Hoare logic in 
Isabelle/HOL. Isabelle/HOL [44] is an interactive theorem prover based on 
higher-order logic. It provides a flexible language in which one can state and 
prove theorems in all areas of mathematics and computer science. The proofs 
are checked by the Isabelle kernel according to the rules of higher-order logic, 
providing a very high level of confidence in the proofs. A standard application 
of Isabelle/HOL is the formalization of program semantics and Hoare logic. See 
[43] for a description of the general technique, applied to a very simple classical 
programming language. 


3.1 Preliminaries in Linear Algebra 


Our work is based on the linear algebra library developed by Thiemann and 
Yamada in the AFP entry [58]. We also use some results on the construction of 
tensor products in another AFP entry by Bentkamp [13]. 

In these libraries, the type ’a vec of vectors with entries in type ’a is defined 
as pairs (n, f), where n is a natural number, and f is a function from natural 
numbers to ’a, such that f(z) is undefined when i > n. Likewise, the type ’a mat 
of matrices is defined as triples (nr, nc, f), where nr and nc are natural numbers, 
and f is a function from pairs of natural numbers to ’a, such that f(i,j) is 
undefined when i > nr or j > nc. The terms carrier_vec n (resp. carrier-mat m 
n) represent the set of vectors of length n (resp. matrices of dimension m x n). 
In our work, we focus almost exclusively on the case where ’a is the complex 
numbers. For this case, existing libraries already define concepts such as the 
adjoint of a matrix, and the (complex) inner product between two vectors. We 
further define concepts such as Hermitian and unitary matrices, and prove their 
basic properties. 

A key result in linear algebra that is necessary for our work is the Schur 
decomposition theorem. It states that any complex n xn matrix A can be written 
in the form QUQ7!, where Q is unitary and U is upper triangular. In particular, 
if A is normal (that is, if AAT = ATA), then A is diagonalizable. A version of 
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the Schur decomposition theorem is formalized in [58], showing that any matrix 
is similar to an upper-triangular matrix U. However, it does not show that Q 
can be made unitary. We complete the proof of the full theorem, following the 
outline of the previous proof. 

Next, we define the key concept of positive semi-definite matrices (called 
positive matrices from now on for simplicity). An n x n matrix A is positive if 
v' Av > 0 for any vector v. We formalize the basic theory of positive matrices, 
in particular showing that any positive matrix is Hermitian. 

Density operators and partial density operators are then defined as follows: 


definition density_operator A —— positive A A trace A = 1 
definition partial_density_operator A ——~ positive A A trace A < 1 


Next, the Lowner partial order is defined as a partial order on the type 
complex mat as follows: 


definition lowner_le (infix <z 65) where 
A <r B dim_row A = dim_row B ^ dim_col A = dim_col B ^ positive (B — A) 


A key result that we formalize states that under the Lowner partial order, any 
non-decreasing sequence of partial density operators has a least upper bound, 
which is the pointwise limit of the operators when written as n x n matrices. 
This is used to define the infinite sum of matrices, necessary for the semantics 
of the while loop. 


3.2 Syntax and Semantics of Quantum Programs 


We now begin with the definition of syntax and semantics of quantum programs. 
First, we describe how to model states of a quantum program. Recall that each 
quantum program operates on a fixed set of quantum variables q;, where each qi 
has dimension d;. These information can be recorded in a locale [33] as follows: 


locale state_sig = 
fixes dims :: nat list 


The total dimension d is given by (here prod_list denotes the product of a 
list of natural numbers). 


definition d = prod_list dims 


The (mixed) state of the system is given by a partial density operator with 
dimension d x d. Hence, we declare 


type_synonym state = complex mat 


definition density_states :: state set where 
density_states = {p © carrier_mat d d. partial_density_operator p} 


Next, we define the concept of quantum programs. They are declared as an 
inductively-defined datatype in Isabelle/HOL, following the grammar given in 
Sect. 2. 
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datatype com = 
SKIP 
| Utrans (complex mat) 
| Seq com com (_;;/ - [60, 61] 60) 
| Measure nat (nat = complex mat) (com list) 
| While (nat > complex mat) com 


At this stage, we assume that all matrices involved operate on the global state 
(that is, all of the quantum variables). We will define commands that operate on a 
subset of quantum variables later. Measurement is defined over any finite number 
of matrices. Here Measure n f C is a measurement with n options, f i fori < n 
are the measurement matrices, and C'! 7 is the command to be executed when the 
measurement yields result 7. Likewise, the first argument to While gives measure- 
ment matrices, where only the first two values are used. 

Next, we define well-formedness and denotation of quantum programs. The 
predicate well.com :: com => bool expresses the well-formedness condition. For a 
quantum program to be well-formed, all matrices involved should have the right 
dimension, the argument to Utrans should be unitary, and the measurements for 
Measure and While should satisfy the condition 7; Mi M; = In. Denotation is 
written as denote :: com = state = state, defined as in Sect. 2. Both well com 
and denote is defined by induction over the structure of the program. The details 
are omitted here. 


3.3 Hoare Triples 


In this section, we define the concept of Hoare triples, and state what needs to 
be proved for soundness and completeness of the deduction system. First, the 
concept of quantum predicates is defined as follows: 


definition is_quantum_predicate P —— P € carrier_mat d d positive PA P<r1md 


With this, we can give the semantic definition of Hoare triples for partial and 
total correctness. These definitions are intended for the case where P and Q are 
quantum predicates, and S is a well-formed program. They define what Hoare 
triples are valid. 


definition hoare_total_correct (=+ {(1-)}/ (-)/ {(1-)} 50) where 
H: {P} S {Q} — (Vpedensity_states. trace (P * p) < trace (Q * denote S p)) 


definition hoare_partial_correct (=p {(1-)}/ (-)/ {(1_)} 50) where 
Ep {P} S {Q} — (Vpedensity_states. 
trace (P * p) < trace (Q * denote S p) + (trace p — trace (denote S p))) 


Next, we define what Hoare triples are provable in the gPD system. A Hoare 
triple for partial correctness is provable (written as Fp {P} S {Q}) if it can 
be derived by combining the rules in Fig. 1. This condition can be defined in 
Isabelle/HOL as an inductive predicate. The definition largely parallels the for- 
mulae shown in the figure. 
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With these definitions, we can state and prove soundness and completeness of 
the Hoare rules for partial correctness. Note that the statement for completeness 
is very simple, seemingly without needing to state “relative to the theory of the 
field of complex numbers”. This is because we are taking a shallow embedding 
for predicates, hence any valid statement on complex numbers, in particular 
positivity of matrices, is in principle available for use in the deduction system 
(for example, in the assumption to the order rule). 


theorem hoare_partial_sound: 
Fp {P} S{Q} = > well.com S => Fp {P} S {Q} 


theorem hoare_partial_complete: 
Ep {P} S{Q} => well.com S => 
is_quantum_predicate P = > is_quantum_predicate Q => Fp {P} S {Q} 


The soundness of the Hoare rules is proved by induction on the predicate 
Fp, Showing that each rule is sound with respect to =p. Completeness is proved 
using the concept of weakest-preconditions, following [61]. 


3.4 Partial States and Tensor Products 


So far in our development, all quantum operations act on the entire global state. 
However, for the actual applications, we are more interested in operations that 
act on only a few of the quantum variables. For this, we need to define an 
extension operator, that takes a matrix on the quantum state for a subset of the 
variables, and extend it to a matrix on all of the variables. More generally, we 
need to define tensor products on vectors and matrices defined over disjoint sets 
of variables. These need to satisfy various consistency properties, in particular 
commutativity and associativity of the tensor product. Note that directly using 
the Kronecker product is not enough, as the matrix to be extended may act 
on any (possibly non-adjacent) subset of variables, and we need to distinguish 
between all possible cases. 

Before presenting the definition, we first review some preliminaries. We make 
use of existing work in [13], in particular their encode and decode operations, and 
emulate their definitions of matricize and dematricize (used in [13] to convert 
between tensors represented as a list and matrices). Given a list of dimensions d;, 
the encode and decode operations (named digit_encode and digit_decode) produce 
a correspondence between lists of indices a; satisfying a; < d; for each i < n, 
and a natural number less than [[,d;. This works in a way similar to finding 
the binary representation of a number (in which case all “dimensions” are 2). 
List operation nths xs S constructs the subsequence of zs containing only the 
elements at indices in the set S. 

The locale partial_state extends state_sig, adding vars for a subset of quantum 
variables. Our goal is to define the tensor product of two vectors or matrices over 
vars and its complement —vars, respectively. 
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locale partial_state = state_sig + 
fixes vars :: nat set 


First, dims1 and dims2 are dimensions of variables vars and -vars: 


definition dims1 = nths dims vars 
definition dims2 = nths dims (—vars) 


The operation encode1 (resp. encode?) provides the map from the product 
of dims to the product of dims1 (resp. dims2). 


definition encodel i = digit-decode dims1 (nths (digit-encode dims i) vars) 
definition encode? i = digit_decode dims2 (nths (digit_encode dims i) (—vars)) 


With this, tensor products on vectors and matrices are defined as follows 
(here d is the product of dims). 


definition tensor_vec :: ’a vec > ’a vec => ’a vec where 
tensor_vec v1 v2 = Matriz.vec d (Xi. v1 $ encodel i * v2 $ encode? i) 


definition tensor_mat :: ’a mat > ’a mat > ’a mat where 
tensor_mat m1 m2 = Matrix.mat d d (X(i,j). 
m1 $$ (encodel i, encode j) * m2 $$ (encode2 i, encode2 j)) 


We prove the basic properties of tensor_vec and tensor_mat, including that 
they behave correctly with respect to identity, multiplication, adjoint, and trace. 

Extension of matrices is a special case of the tensor product, where the matrix 
on —vars is the identity (here d2 is the product of dim2). 


definition mat_extension :: ’a mat > ’a mat where 
mat_eztension m = tensor_mat m (1m d2) 


With mat_extension, we can define “partial” versions of quantum program 
commands Utrans, Measure and While. They take a set of variables q as an 
extra parameter, and all matrices involved act on the vector space associated 
to q. These commands are named Utrans_P, Measure_P and While_P. They are 
usually used in place of the global commands in actual applications. 

More generally, we can define the tensor product of vectors and matrices on 
any two subsets of quantum variables. For this, we define another locale: 


locale partial_state2 = state_sig + 
fixes varsi :: nat set and vars2 :: nat set 
assumes disjoint: varsi N vars2 = {} 


To make use of tensor_mat to define tensor product in this more general 
setting, we need to find the relative position of variables vars! within vars1 U 
vars2. This is done using ind_in_set, which counts the position of z within A. 


definition ind_in_set A x = card {i.i E€ ANi< a} 
definition vars1’ = (ind_in_set (vars1 U vars2)) ‘ vars1 
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Finally, the more general tensor products are defined as follows (note since 
we are now outside the partial_state locale, we must use qualified names for ten- 
sor_vec and tensor_mat, and supply extra arguments for variables in the locale. 
Here dims0 = nths dims (vars1 U vars2) is the total list of dimensions). 


definition ptensor_vec :: ’a vec => ’a vec => ’a vec where 
ptensor_vec v1 v2 = partial_state.tensor_vec dims0 vars1’ v1 v2 


definition ptensor_mat :: ’a mat > ’a mat => ’a mat where 
ptensor_mat m1 m2 = partial_state.tensor_mat dims0 vars1’ m1 m2 


The partial extension pmat_extension is defined in a similar way as before. 


definition pmat_eztension :: ’a mat => ’a mat where 
pmat_extension m = ptensor_mat m (1m d2) 


The definitions ptensor_vec and ptensor_mat satisfy several key consistency 
properties. In particular, they satisfy associativity of tensor product. For matri- 
ces, this is expressed as follows: 


theorem ptensor_mat_assoc: 
viN v2 ={} => 
(vi U v2) N vs = {} = 
vl U v2 U v3 C {0..<length dims} => 
ptensor_mat dims (v1 U v2) v8 (ptensor-mat dims v1 v2 m1 m2) m3 = 
ptensor_mat dims v1 (v2 U v3) m1 (ptensor_mat dims v2 v3 m2 m3) 


Together, these constructions and consistency properties provide a framework 
in which one can reason about arbitrary tensor product of vectors and matrices, 
defined on mutually disjoint sets of quantum variables. 


3.5 Case Study: Products of Hadamard Matrices 


In this section, we illustrate the above framework for tensor product of matrices 
with an application, to be used in the verification of Grover’s algorithm in the 
next section. 

In many quantum algorithms, we need to deal with the tensor product of 
an arbitrary number of Hadamard matrices. The Hadamard matrix (denoted 
hadamard in Isabelle) is given by: 


af 


For example, in Grover’s algorithm, we need to apply the Hadamard trans- 
form on each of the first n quantum variables, given by vars1. A single Hadamard 
transform on the 27’th quantum variable, extended to a matrix acting on the first 
n quantum variables, is defined as follows: 
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definition hadamard_on_i :: nat = complex mat where 
hadamard_on_i i = pmat_extension dims {i} (vars1 — {i}) hadamard 


The effect of consecutively applying the Hadamard transform on each of the 
first n quantum variables is equivalent to multiplying the quantum state by 
etH_k (n — 1), where exH_k is defined as follows. 


fun exH_k :: nat > complex mat where 
etH_k 0 = hadamard_on_i 0 
| etH_k (Suc k) = exH_k k * hadamard_oni (Suc k) 


Crucially, this matrix product of extensions of Hadamard matrices must equal 
the tensor product of Hadamard matrices. That is, with H_k defined as 


fun H-k :: nat > complex mat where 
H_k 0 = hadamard 
| H_k (Suc k) = ptensor_mat dims {0..< Suc k} {Suc k} (H-k k) hadamard 


we have the theorem 
lemma exH_eg_H: exH_k (n — 1) = H-k (n — 1) 


The proof of this result is by induction, requiring the use of associativity of 
tensor product stated above. 


4 Verification of Grover’s Algorithm 


In this section, we describe our application of the above framework to the veri- 
fication of Grover’s quantum search algorithm [32]. Quantum search algorithms 
[18,32] concern searching an unordered database for an item satisfying some 
given property. This property is usually specified by an oracle. In a database of 
N items, where M items satisfy the property, finding an item with the property 
requires on average O(N/M) calls to the oracle for classical computers. Grover’s 
algorithm reduces this complexity to O(,/. N/M). 

The basic idea of Grover’s algorithm is rotation. The algorithm starts from an 
initial state/vector. At every step, it rotates towards the target state/vector for 
a small angle. As summarised in [18,19,42], it can be mathematically described 
by the following equation [42, Eq. (6.12)]: 


EETA) ja) + sin * 9) (3), 


G" |W) = cos( 


where G represents the operator at each step, |W) is the initial state, 0 = 
2arccos \/(N — M)/N, |a) is the bad state (for items not satisfying the prop- 
erty), and |) is the good state (for items satisfying the property). Thus when 0 
is very small, i.e., M < N, it costs O(,/ N/M) rounds to reach a target state. 
Originally, Grover’s algorithm only resolves the case M = 1 [32]. It is imme- 
diately generalized to the case of known M with the same idea and the case of 
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unknown M with some modifications [18]. After that, the idea is generalized to 
all invertible quantum processes [19]. 

The paper [61] uses Grover’s algorithm as the main example illustrating 
quantum Hoare logic. We largely follow its approach in this paper. See also [42, 
Chapter 6] for a general introduction. 

First, we setup a locale for the inputs to the search problem. 


locale grover_state = 
fixes n :: nat and f :: nat => bool 
assumes n: n> 1 
and dimM: card {i. i < (2::nat) “nA fi} > 0 
card {i. i < (2::nat) ^n A fi} < (2::mnat) ^n 


Here n is the number of qubits used to represent the items. That is, we assume 
N = 2” items in total. The oracle is represented by the function f, where only 
its values on inputs less than 2” are used. The number of items satisfying the 
property is given by M = card {i.i < N A fi}. 

Next, we setup a locale for Grover’s algorithm. 


locale grover_state_sig = grover_state + state_sig + 
fixes R :: nat and K :: nat 
assumes dims_def: dims = replicate n 2 @ [K] 
assumes R: R =r / (2 *0)— 1/2 
assumes K: K > R 


As in [61], we assume R = 7/20 — 1/2 is an integer. This implies that the 
quantum algorithm succeeds with probability 1. This condition holds, for exam- 
ple, for all N, M where N = 4M. Since we did not formalize quantum states with 
infinite dimension, we replace the loop counter, which is infinite dimensional in 
[61], with a variable of dimension K > R. We also remove the control variable 
for the oracle used in [61]. Overall, our quantum state consists of n variables of 
dimension 2 for representing the items, and one variable of dimension K for the 
loop counter. 

We now present the quantum program to be verified. First, the operation 
that performs the Hadamard transform on each of the first n variables is defined 
by induction as follows. 


fun hadamard_n :: nat = com where 
hadamard_n 0 = SKIP 
| hadamard_n (Suc i) = hadamard_n i ;; Utrans (tensor_P (hadamard_on_i i) (lm K)) 


Here tensor_P denotes the tensor product of a matrix on the first n variables 
(of dimension 2” x 2”) and a matrix on the loop variable (of dimension K x K). 
Executing this program is equivalent to multiplying the quantum state corre- 
sponding to the first n variables by H®”, as shown in Sect. 3.5. 

The body of the loop is given by: 


definition D :: com where 
D = Utrans_P vars1 mat_O ;; 
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hadamard_n n ;; 

Utrans_P vars1 mat_Ph ;; 
hadamard_n n ;; 

Utrans_P vars2 (mat_incr n) 


where each of the three matrices mat_O, mat_Ph and mat_incr can be defined 
directly. 


definition mat_O :: complex mat where 
mat.O = mat N N (X(i,j). if i = j then (if f i then 1 else —1) else 0) 
definition mat_Ph :: complex mat where 
mat_Ph = mat N N (X(i,j). if i = j then if i = 0 then 1 else —1 else 0) 
definition mat_incr :: nat > complex mat where 
mat_incr n = mat n n (A(i,j). if i = 0 then (ifj = n — 1 then 1 else 0) 
else (if i = j + 1 then 1 else 0)) 


Finally, the Grover’s algorithm is as follows. Since we do not have initializa- 
tion, we skip initialization to zero at the beginning and instead assume that the 
state begins in the zero state in the precondition. 


definition Grover :: com where 
Grover = hadamard_n n ;; 
While_P vars2 M0 M1 D ;; 
Measure_P vars1 N testN (replicate N SKIP) 


where the measurements for the while loop and at the end of the algorithm are: 


definition M0 = mat K K (X(i,j). ifi =j A iÈ R then 1 else 0) 
definition M1 = mat K K (X(1,j). ifi =j Ni < R then 1 else 0) 
definition testN k = mat N N (A(i,j). ifi =k A j =k then 1 else 0) 


We can now state the final correctness result. Let proj v be the outer product 
vvt, and proj_k k be |k)(k|, where |k} is the k’th basis vector on the vector space 
corresponding to the loop variable. Let pre and post be given as follows: 


definition pre = proj (vec N (Ak. if k = 0 then 1 else 0)) 
definition post = mat N N (X(i, j). if i =j A fi then 1 else 0) 


Then, the (partial) correctness of Grover’s algorithm is specified by the fol- 
lowing Hoare triple. 


theorem grover_partial_correct: 
Hp {tensor_P pre (proj_k 0)} 
Grover 
{tensor_P post (lm K)} 


We now briefly outline the proof strategy. Following the definition of Grover, 
the proof of the above Hoare triple is divided into three main parts, for the 
initialization by Hadamard matrices, for the while loop, and for the measurement 
at the end. 


200 J. Liu et al. 


In each part, assertions are first inserted around commands according to the 
Hoare rules to form smaller Hoare triples. In particular, the precondition of the 
while loop part is exactly the invariant of the loop. Moreover, it has to be shown 
that these assertions satisfy the conditions for being quantum predicates, which 
involve computing their dimension, showing positiveness, and being bounded 
by the identity matrix under the Lowner order. Then, these Hoare triples are 
derived using our deduction system. Before combining them together, we have 
to show that the postcondition of each command is equal to the precondition 
of the later one. After that, the three main Hoare triples can be obtained by 
combining these smaller ones. 

After the derivation of the three Hoare triples above, we prove the L6wner 
order between the postcondition of each triple and the precondition of the follow- 
ing triple. Afterwards, the triples can be combined into the Hoare triple below: 


theorem grover_partial_deduct: 
Hp {tensor_P pre (proj_k 0)} 
Grover 
{tensor_P post (lm K)} 


Finally, the (partial) correctness of Grover’s algorithm follows from the 
soundness of our deduction system. 


5 Discussion 


Compared to classical programs, reasoning about quantum programs is more 
difficult in every respect. Instead of discrete mathematics in the classical case, 
even the simplest reasoning about quantum programs involves complex numbers, 
unitary and positivity properties of matrices, and the tensor product. Hence, it 
is to be expected that formal verification of quantum Hoare logic and quantum 
algorithms will take much more effort. In this section, we describe some of the 
automation that we built to simplify the manual proof, and give some statistics 
concerning the amount of effort involved in the formalization. 


5.1 Automatic Proof of Identities in Linear Algebra 


During the formalization process, we make extensive use of ring properties of 
matrices. These include commutativity and associativity of addition, associativ- 
ity of multiplication, and distributivity. Compared to the usual case of numbers, 
applying these rules for matrices is more difficult in Isabelle/HOL, since they 
involve extra conditions on dimensions of matrices. For example, the rule for 
commutativity of addition of matrices is stated as: 


lemma comm_add_mat: 
A € carrier_mat nr nc => B € carrier mat nr ne = A+B=BH+A 
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These extra conditions make the rules difficult to apply for standard Isabelle 
automation. For our work, we implemented our own tactic handling these rules. 
In addition to the ring properties, we also frequently need to use the cyclic 
property of trace (e.g. tr(ABC) = tr(BCA)), as well as the properties of adjoint 
((AB)' = Bt At and Att = A). For simplicity, we restrict to identities involving 
only n x n matrices, where n is a parameter given to the tactic. 

The tactic is designed to prove equality between two expressions. It works 
by computing the normal form of the expressions — using ring identities and 
identities for the adjoint to fully expand the expression into polynomial form. 
To handle the trace, the expression tr(A;---A,,) is normalized to put the A; 
that is the largest according to Isabelle’s internal term order last. All dimension 
assumptions are collected and reduced (for example, the assumption A * B € 
carrier_mat n n is reduced to A € carrier-mat n n and B € carrier_mat n n). 

Overall, the resulting tactic is used 80 times in our proofs. Below, we list some 
of the more complicated equations resolved by the tactic. The tactic reduces the 
goal to dimensional constraints on the atomic matrices (e.g. M € carrier_mat n 
n and P € carrier_mat n n in the first case). 


tr(MMŻ(PPŻ)) = tr((P'M)(PtM)') 
tr(MoAMÅ) + tr(M1 AMÌ) = tr((MÌ Mo + MÌ M1) A) 
H'(Ph'(H'Q2:H)Ph)H = (HPhH\'Q:(HPhH) 


r 
r 


5.2 Statistics 


Overall, the formalization consists of about 11,500 lines of Isabelle theories. An 
old version of the proof is developed on and off for two years. The current version 
is re-developed, using some ideas from the old version. The development of the 
new version took about 5 person months. Detailed breakdown of number of lines 
for different parts of the proof is given in the following table. 


Description Files Number of lines 
Preliminaries Complex_Matrix, Matriz_Limit, Gates | 4197 
Semantics Quantum_Program 1110 
Hoare logic Quantum_Hoare 1417 
Tensor product Partial_State 1664 
Grover’s algorithm | Grover 3184 
Total 11572 


In particular, with the verification framework in place, the proof of correct- 
ness for Grover’s search algorithm takes just over 3000 lines. While this shows 
that it is realistic to use the current framework to verify more complicated algo- 
rithms such as Shor’s algorithm, it is clear that more automation is needed to 
enable verification on a larger scale. 
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6 Related Work 


The closest work to our research is Robert Rand’s implementation of Qwire in 
Coq [49,50]. Qwire [47] is a language for describing quantum circuits. In this 
model, quantum algorithms are implemented by connecting together quantum 
gates, each with a fixed number of bit /qubit inputs and outputs. How the gates 
are connected is determined by a classical host language, allowing classical con- 
trol of quantum computation. The work [49] defines the semantics of Qwire in 
Coq, and uses it to verify quantum teleportation, Deutsch’s algorithm, and an 
example on multiple coin flips to illustrate applicability to a family of circuits. In 
this framework, program verification proceeds directly from the semantics, with- 
out defining a Hoare logic. As in our work, it is necessary to solve the problem 
of how to define extensions of an operation on a few qubits to the global state. 
The approach taken in [49] is to use the usual Kronecker product, augmented 
either by the use of swaps between qubits, or by inserting identity matrices at 
strategic positions in the Kronecker product. 

There are two main differences between [49] and our work. First, quantum 
algorithms are expressed using quantum circuits in [49], while we use quantum 
programs with while loops. Models based on quantum circuits have the advan- 
tage of being concrete, and indeed most of the earlier quantum algorithms can be 
expressed directly in terms of circuits. However, several new quantum algorithms 
can be more properly expressed by while loops, e.g. quantum walks with absorb- 
ing boundaries, quantum Bernoulli factory (for random number generation), 
HHL for systems of linear equations and qPCA (Principal Component Analy- 
sis). Second, we formalized a Hoare logic while [49] uses denotational semantics 
directly. As in verification of classical programs, Hoare logic encapsulates stan- 
dard forms of argument for dealing with each program construct. Moreover, 
the rules for QHL is in weakest-precondition form, allowing the possibility of 
automated verification condition generation after specifying the loop invariants 
(although this is not used in the present paper). 

Besides Rand’s work, quite a few verification tools have been developed for 
quantum communication protocols. For example, Nagarajan and Gay [41] mod- 
eled the BB84 protocol [12] and verified its correctness. Ardeshir-Larijani et al. 
[7,8] presented a tool for verification of quantum protocols through equivalence 
checking. Existing tools, such as PRISM [37] and Coq, are employed to develop 
verification tools for quantum protocols [17,29]. Furthermore, an automatic tool 
called Quantum Model-Checker (QMC) is developed [28,46]. 

Recently, several specific techniques have been proposed to algorithmically 
check properties of quantum programs. In [63], the Sharir-Pnueli-Hart method 
for verifying probabilistic programs [54] has been generalised to quantum pro- 
grams by exploiting the Schrodinger-Heisenberg duality between quantum states 
and observables. Termination analysis of nondeterministic and concurrent quan- 
tum programs [38] was carried out based on reachability analysis [64]. Invariants 
can be generated at some steps in quantum programs for debugging and verifi- 
cation of correctness [62]. But up to now no tools are available that implements 
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these techniques. Another Hoare-style logic for quantum programs was proposed 
in [86], but without (relative) completeness. 

Interactive theorem proving has made significant progress in the formal ver- 
ification of classical programs and systems. Here, we focus on listing some tools 
designed for special kinds of systems. EasyCrypt [10,11] is an interactive frame- 
work for verifying the security of cryptographic constructs in the computational 
model. It is developed based on a probabilistic relational Hoare logic to support 
machine-checked construction and verification of game-based proofs. Recently, 
verification of hybrid systems via interactive theorem proving has also been stud- 
ied. KeYmaera X [26] is a theorem prover implementing differential dynamic 
logic (d£) [48], for the verification of hybrid programs. In [60], a prover has been 
implemented in Isabelle/HOL for reasoning about hybrid processes described 
using hybrid CSP [34]. 

Our work is based on existing formalization of matrices and tensors in 
Isabelle/HOL. In [59] (with corresponding AFP entry [58]), Thiemann et al. 
developed the matrix library that we use here. In [14] (with corresponding AFP 
entry [13]), Bentkamp et al. developed tensor analysis based on the above work, 
in an effort to formalize an expressivity result of deep learning algorithms. 


7 Conclusion 


We formalized quantum Hoare logic in Isabelle/HOL, and verified the soundness 
and completeness of the deduction system for partial correctness. Using this 
deduction system, we verified the correctness of Grover’s search algorithm. This 
is, to our best knowledge, the first formalization of a Hoare logic for quantum 
programs in an interactive theorem prover. 

This work is intended to be the first step of a larger project to construct 
a framework under which one can efficiently verify the correctness of complex 
quantum programs and systems. In this paper, our focus is on formalizing the 
mathematical machinery to specify the semantics of quantum programs, and 
prove the correctness of quantum Hoare logic. To verify more complicated pro- 
grams efficiently, better automation is needed at every stage of the proof. We 
have already begun with some automation for proving identities in linear alge- 
bra. In the future, we plan to add to it automation facility for handling matrix 
computations, tensor products, positivity of matrices, etc., all linked together 
by a verification condition generator. 

Another direction of future work is to formalize various extensions of quan- 
tum Hoare logic, to deal with classical control, recursion, concurrency, etc., with 
the eventual goal of being able to verify not only sequential programs, but also 
concurrent programs and communication systems. 
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Abstract. We present SECCSL, a concurrent separation logic for prov- 
ing expressive, data-dependent information flow security properties of 
low-level programs. SECCSL is considerably more expressive, while being 
simpler, than recent compositional information flow logics that cannot 
reason about pointers, arrays etc. To capture security concerns, SECCSL 
adopts a relational semantics for its assertions. At the same time it inher- 
its the structure of traditional concurrent separation logics; thus SEcCSL 
reasoning can be automated via symbolic execution. We demonstrate this 
by implementing SECC, an automatic verifier for a subset of the C pro- 
gramming language, which we apply to a range of benchmarks. 


1 Introduction 


Software verification successes abound, whether via interactive proof or via auto- 
matic program verifiers. While the former has yielded individual, deeply verified 
software artifacts [21,24,25] primarily by researchers, the latter appears to be 
having a growing impact on industrial software engineering [11,36,39]. 

At the same time, recent work has heralded major advancements in program 
logics for reasoning about secure information flow [23,33,34|—i.e. whether pro- 
grams properly protect their secrets—yielding the first general program logics 
and proofs of information flow security for non-trivial concurrent programs [34]. 
Yet so far, such logics have remained confined to interactive proof assistants, 
making them practically inaccessible to industrial developers. 

This is not especially surprising. The COVERN logic [34], for example, pays for 
its generality with regard to expressive security policies, in terms of complexity. 
Worse, these logics reason only over very simple toy programming languages, 
which even lack support for pointers, arrays, and structures. Their complexity, we 
argue, hinders proof automation and makes scaling up these logics to real-world 
languages impractical. How, therefore, can we leverage the power of existing 
automatic deductive verification approaches for security proofs? 

In this paper we present Security Concurrent Separation Logic (SECCSL), 
which achieves an unprecedented combination of simplicity, power, and ease of 
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automation by capturing core concepts such as data-dependent variable sensi- 
tivity [27,31,50], and shared invariants on sensitive memory [34] in the familiar 
style of Concurrent Separation Logic (CSL) [38], as exemplified in Sect. 2. 

Prior work [14,20] has noted the promise of separation logic for reasoning 
about information flow yet, to date, that promise remains unrealised. Indeed, 
the only two prior encodings of information flow concepts into separation logics 
which we are aware of have overlooked crucial features like concurrency [14], 
and lack the ability to separately specify the sensitivity of values and memory 
locations as we explain in Sect.2. The logic in [20] lacks soundness arguments 
altogether while [14] fail to satisfy basic properties needed for automation (see 
the discussion following Proposition 1). 

Designing a logic with the right combination of features, with the right seman- 
tics, is therefore non-trivial. To manage this, SECCSL assertions have a rela- 
tional interpretation [6,49] over a standard heap model (Sect.3). This allows 
one to canonically encode information flow concepts while maintaining the app- 
roach and structure of traditional CSL proofs. To do so we adapt existing proof 
techniques for the soundness of CSL [46] into a compositional information flow 
security property (Sect.4) that, like SECCSL itself, is simple and powerful. We 
have mechanized the soundness of SECCSL in Isabelle/HOL [37]. 

To demonstrate SECCSL’s ease of use and capacity for automation, we imple- 
mented the prototype tool SECC (Sect.5). We target C because it dominates 
low-level security-critical code. SECC automates SECCSL reasoning via sym- 
bolic execution, in the style of contemporary Separation Logic program verifiers 
like VeriFast [22], Viper [30], and Infer [10]. SECC correctly analyzes well-known 
benchmark problems (collected in [17]) within a few milliseconds; and we verify 
a variant of the CDDC case study [5] from the COVERN project. Our Isabelle 
theories, the open source prototype tool SECC, and examples are available online 
at https://covern.org/secc [18]. 


2 An Overview of SECCSL 


2.1 Specifying Information Flow Control in SECCSL 


Consider the program in Fig. 1. It maintains a global pointer rec to a shared 
record, protected by the lock mutex. The is_classified field of the record iden- 
tifies the confidentiality of the record’s data: when is_classified is true, the 
value stored in the data field is confidential, and otherwise it is safe to release 
publicly. The left thread outputs the data in the record whenever it is public 
by writing to the (memory mapped) output device register pointer OUTPUT_REG 
(here also protected by mutex). The right thread updates the record, ensuring 
its content is not confidential, here by clearing its data. 

Suppose assigning a value d to the OUTPUT_REG register causes d to be out- 
putted to a publicly-visible location. Reasoning, then, that the example is secure 
requires capturing that (1) the data field of the record pointed to by rec is con- 
fidential precisely when the record’s is_classified field says it is, and (2) data 
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/* globals shared between the two threads */ 

struct record { bool is_classified; int data; }; 

struct record * rec = /x ... initialisation omitted ... */; 

volatile int * const OUTPUT_REG = /* memory-mapped IO device register */; 


/* thread 1: output the record */ /* thread 2: edit the record */ 
while(true) { Lock(mutex) ; 
lock(mutex) ; /* clear the record */ 
if (!rec->is_classified) rec->is_classified = FALSE; 
*QUTPUT_REG = rec->data; rec->data = 0; 
unlock(mutex); } unlock(mutex) ; 


Fig. 1. Example of concurrent information flow. 


sink OUTPUT_REG should never have confidential data written to it. Therefore the 
example only ever writes non-confidential data into OUTPUT_REG. 

Condition (1) specifies the sensitivity of a data value in memory, whereas 
condition (2) specifies the sensitivity of the data that a memory location (i.e. data 
sink) is permitted to hold. Prior security separation logics [14,20] reason only 
about value-sensitivity condition (1) but, as we explain below, both are needed. 
Like those prior logics, in SECCSL one specifies the sensitivity of the value 
denoted by an expression e via a security label £: the assertion e :: £ means that 
the sensitivity of the value denoted by expression e is at most £. Security labels 
are drawn from a lattice with top element high (denoting the most confidential 
information), bottom element low (denoting public information), and ordered 
via C: L C @ means that information labelled with @’ is at least as sensitive as 
that labelled by £. Using this style of assertion, in conjunction with standard 
separation logic connectives (explained below), condition (1) can be specified as: 


Jc d. rece (c,d) Ac: lowAd:: (c? high : low) (1) 


Separation logic’s points-to predicate e ++ e’ means the memory location 
denoted by expression e holds the value denoted by e’. Thus (1) can be read 
as saying that the rec pointer points to a pair of values (c,d). The first c (the 
value of the is_classified field) is public. The sensitivity of the second d (the 
value of the data field) is given by the value of the first c: it is high when c 
is true and is low otherwise. SECCSL integrates such reasoning about value- 
dependent sensitivity [27,31,50] neatly with functional properties of low-level 
data structures, which we think is more natural and straightforward than the 
approach of [34,35] that keeps the two concerns separate. 

Value-sensitivity assertion e :: / is a judgement on the maximum sensitivity 
of the data source(s) from which e has been derived. Location-sensitivity asser- 
tions, on the other hand, are used to specify security policies on data sinks like 
OUTPUT_REG. These assertions augment the separation logic points-to predicate 
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with a security label Z, and are used to specify which parts of the memory are 
observable to the attacker (and so must never contain sensitive information): 
e+ e' means that the value denoted by the expression e’ is present in memory 
at the location denoted by e, and additionally that at all times the sensitivity of 
the value stored in that locations is never allowed to exceed ¢. Thus in SECCSL, 
e+ e! abbreviates e M2 e'. In Fig. 1, that OUTPUT_REG is publicly-observable 
can be specified as: 


Jv. OUTPUT_REG E> v (2) 


2.2 Reasoning in SECCSL 


SECCSL judgements have the form: 


ta F {P} c {Q} (3) 


Here Z4 is the attacker security level, c is the (concurrent) program com- 
mand being executed, and P and Q are the program’s pre- resp. postcondition. 
Judgement (3) means that if the program c begins in a state satisfying its precon- 
dition P then, when it terminates, the final state will satisfy its postcondition Q. 
Analogously to [44] the program is guaranteed to be memory safe. We defer a 
description of £4 and the implied security property to Sect. 2.3. 

As with traditional CSLs, SECCSL is geared towards reasoning over shared- 
memory programs that use lock-based synchronisation. Each lock l has an asso- 
ciated invariant inv(J), which is simply a predicate, like P or Q in (3), that 
describes the shared memory that the lock protects. In Fig. 1, where the lock 
mutex protects the shared pointer rec and OUTPUT_REG, the associated invariant 
inv(mutex) is simply the conjunction of (1) and (2). 


(Ac d. rec > (c,d) Ac: lowA d: (c? high : low)) x (Su. OUTPUT_REG HS v) 

(4) 
Separating conjunction PxQ asserts that the assertions P and Q both hold and, 
additionally, that the memory locations referenced by P and Q respectively do 
not overlap. Thus SECCSL invariants, like SECCSL assertions, describe together 
both functional properties (e.g. rec is a valid pointer) and security concerns (e.g. 
the OUTPUT_REG location is publicly visible) of the shared state. 

When acquiring a lock one gets to assume that the lock’s invariant holds [38]. 
Subsequently, when releasing the lock one must prove that the invariant has been 
re-established. For example, when reasoning about the code of the left-thread in 
Fig. 1, upon acquiring the mutex, SECCSL adds formula (4) to the intermediate 
assertion, which allows proving that the loop body is secure. When reasoning 
about the right thread, one must prove that the invariant has been re-established 
when it releases the mutex. This is the reason e.g. that the right thread must 
clear the data field after setting is_classified to false. 
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Reasoning in SECCSL proceeds forward over the program text according to 
the rules in Fig.4. When execution forks, as in Fig.1, one reasons over each 
thread individually. For Fig. 1, SECCSL requires proving that the guard of the 
if-condition is low, i.e. that the program is not branching on a secret (rule 
IF in Fig.4), which would correspond to a timing channel, see Sect. 2.3 below. 
This follows from the part c :: low of invariant (4). Secondly, after the write 
to OUTPUT_REG, SECCSL requires that the expression that is being written to 
the location OUTPUT_REG has sensitivity low (rule STORE in Fig. 4). This follows 
from d:: (c ? high : low) in the invariant, which simplifies to d:: high given the 
guard c = true of the if-statement. Finally, when the right thread releases mutex, 
invariant (4) holds for the updated contents of rec (rule UNLOCK in Fig. 4). 


2.3 Security Intuition and Informal Security Property 


But what does security mean in SECCSL? Indeed, the SECCSL a judgement 
La F {P} c {Q} additionally implies that the program c does not leak any 
sensitive information during its execution to potential attackers. 

The attacker security level 44 in (3) represents an upper bound on the parts of 
the program’s memory that a potential, passive attacker is assumed to be able 
to observe before, during, and after the program’s execution. Intuitively this 
encompasses all memory locations whose sensitivity is E 44. Which memory 
locations have sensitivity E @,4 is defined by the location-sensitivity assertions 
in the precondition P and the lock invariants: A memory location loc is visible 
to the @4 attacker iff P or a lock invariant contains some e “> e’ and in the 
program’s initial state e evaluates to loc and e; evaluates to some label £ such 
that @C Z4 (see Fig. 3). 

Which data is sensitive and should not be leaked to the £4 attacker is defined 
by the value-sensitivity assertions in P and the lock invariants: an expression e 
is sensitive when P or a lock invariant contains some e:: e; and in the program’s 
initial state e; evaluates to some £ with £ Z £4. Security, then, requires that in all 
intermediate states of the program’s execution no sensitive data (as defined by 
value-sensitivity assertions) can be inferred via the attacker-observable memory 
(as defined by location-sensitivity assertions). 

SECCSL proves a compositional security property that formalises this intu- 
ition (see Definition 3). Since the property needs to be compositional with 
regards to concurrent execution, the resulting security property is timing sensi- 
tive, meaning that not only must the program never reveal sensitive data into 
attacker-observable memory locations but the times at which it updates these 
memory locations cannot depend on sensitive data. It is well-known that timing- 
insensitive security properties are not compositional under standard scheduling 
models [34,48]. For this reason SECCSL forbids programs from branching on 
sensitive values. We believe that this restriction could in principle be relaxed in 
the future via established techniques [28,29]. 

SECCSL’s top-level soundness (Sect. 4) formalises the above intuitive defi- 
nition of security in the style of traditional noninterference [19] that compares 
two program executions with respect to the observations that can be made by 
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an attacker. SECCSL adopts a relational interpretation for the assertions P 
and Q, and the lock invariants, in which they are evaluated against pairs of exe- 
cution states. This relational semantics directly expresses the comparison needed 
for noninterference. As a result, most of the complexities related to SECCSL’s 
soundness are confined to the semantic level, whereas the calculus retains its 
similarity to standard separation logic and hence its simplicity. 

Under this relational semantics (see Fig. 2 in Sect. 3), when a pair of states 
satisfies an assertion P, it implies that the two states agree on the values of all 
non-sensitive expressions as defined by P (Lemma 1). Noninterference is then 
stated as Theorem 2: Program c with precondition P is secure against the £4- 
attacker if, whenever executed twice from two initial states jointly satisfying P 
and the lock invariants (and so agreeing on the values of all data assumed to be 
initially observable to the £4 attacker), in all intermediate pairs of states arrived 
at after running each execution for the same number of steps, the resulting states 
again agree at that initially 04-visible memory. This definition is timing sensitive 
as it compares executions that have the same number of steps. 


3 The Logic SECCSL 


3.1 Assertions 


Pure expressions e that do not depend on the heap are composed of variables x, 
function applications, equations, and conditional expressions. Pure relational 
formulas p comprise boolean expressions ¢, value sensitivity e::e;, and relational 
implication = (wlog. covering relational =~, A, V). We assume a standard first- 
order many sorted typing discipline (not elaborated). 


e= x| fler,...,€n)|e1=e2|G? e1 : ez pu=oleze,| pr >p 


We postulate that the logical signature contains a sort Label, corresponding to 
the security lattice, with constants low, high: Label and a binary predicate sym- 
bol C: Label x Label — Bool, whose interpretation satisfies the lattice axioms. 
SECCSL’s assertions P, Q may additionally refer to the heap and thus include 
the empty heap description, labelled points-to predicates (heap location sensitiv- 
ity assertions), assertions guarded by (pure) conditionals, ordinary overlapping 
conjunction as well as separating conjunction, and existential quantification. 


P:= p|emp|ep = e| (0? P:Q)|PAQ|PxQ|Iz.P 


Disjunction, negation, and implication are excluded because they cause issues 
for describing the set of &visible heap location to the ¢-attacker, similarly to the 
problem of defining heap footprints for non-precise assertions [26,40,41]. These 
connectives can still occur between pure and relational expressions. 

The standard expression semantics |e], evaluates e over a store s, which 
assigns values to variables x as s(x). The interpretation f^ of a function symbol f 
is a function, given statically by a logical structure A. Specifically, C^ is the 
semantic ordering of the security lattice. We write s = ¢ if [¢]. = true. 
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The relational semantics of assertions, written (s, h), (s’,h’) He P, is defined 
in Fig.2 over two states (s,h) and (s’,h’) each consisting of a store and a 
heap. The semantics is defined against the attacker security level ¢ (called 44 in 
Sect. 2.3). Stores s and s’ are related via e :: ey. We require the expression e; 
denoting the sensitivity to coincide on s and s’ and whenever [er]. CA ¢ 
holds, e must evaluate to the same value both states, (7). Heaps are related 
by (s, h), (s, h’) He ep £ e,, which similarly ensures that the two heap frag- 
ments are identical h = h’ when e; says so, (9). Conditional assertions ¢ ? P : Q 
evaluate to P when ¢ holds (relationally), and to Q otherwise. The separat- 
ing conjunction splits both heaps independently, (12). Similarly, the existential 
quantifier picks two values v and v’, (13). Whether parts of the split resp. these 
two values actually agree will depend on other assertions made. 


Using the abbreviation s,h = ep > ev 4= h = {lep]s = [ev]s} 


(s, h), (s, h") He emp h=kř =Ø (5) 

(s, h), (s, h’) He d > s| ¢and s K¢ (6) 

(s, h), (5, h) He e £ e (7) 
S [el = [eis and (lels E4 = [els = lels) 

(s, h), (s’, A’) He p1 > p2 (8) 
<=> (s,h),(s’,h’) Ke pı implies (s, h), (s', h’) Ke pa 

(s, h), (s, h’) He ep > ev (9) 
< s,h | ep =| ey and s',h' H ep > ey and (s, h), (s', h’) H ep ser A ey € 1 

(s, h), (5, R’) =e (6? P: Q) (10) 


(s,h),(s',h') Ee P, if s| ọġ and s’ EG 
(s, h), (s', h’) He Q, otherwise 


(s, h), (5, K) Ee PAQ (11) 
<> (s,h), (s',h') Ke P and (s, h), (5, h) Ke Q 
(s, h), (5, h) He Px Q (12) 


<=> there are disjoint sub-heaps hı, h2 and hi, h3 
with h = hi Who and h’ = hi wh} 
such that (s, h1), (s', h1) He Pi and (s, h2), (s', h3) He Pe 
(s, h), (s, h) He 3 æ. P (13) 
<=> there are values v, v'such that (s(x := v), h), (s(x := v^), h’) = P 


Fig. 2. Relational semantics of assertions. 


To capture strong security properties, we require a declarative specification 
of which heap locations are considered visible to the -attacker, when assertion P 
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lowse(p, s) = 9, notably lows¢(e :: ex, s) = @ 
lowse(P x Q, s) = lowse(P A Q, s) = lowse(P, s) U lowse(Q, s) 


{lerls}, [es E^ £ 


lowse(ep => ey, s) = . 
0, otherwise 


lowse(P,s), sEo@ 


lowse(Q, s), otherwise 


lowse(¢ ? P : Q,s) = l 


ETE E a s), Vv. lowse(P, s) = lowse(P, s(x + v)) 


0, otherwise 


Fig. 3. Low locations of an assertion. 


holds in some (initial) state (see Sect. 2.3). We define this set in Fig. 3, denoted 
lows,(P, s) for initial store s. Note that, by design, the definition does not give a 


useful result for an existential like dp v. p 4%. v, This mirrors the usual difficulty 
of defining footprints for non-precise separation logic assertions [26, 40,41]. This 
restriction is not an issue in practice, as location sensitivity assertions ep Éb By 
are intended to describe the static regions of memory (data sinks) visible to the 
attacker, for which existential quantification over variables free in ep or e; is not 
necessary. A generalization to all precise predicates should be possible. 


3.2 Entailments 


Although implications between spatial formulas is not part of the assertion lan- 


guage, entailments P = Q between assertions still play a role in SECCSL’s 
Hoare style consequence rule (CONSEQ in Fig. 4). We discuss entailment now as 
it sheds useful light on some consequences of SECCSL’s relational semantics. 


Definition 1 (Secure Entailment). P = Q holds iff 


- (s,h),(s’,h’) Ee P implies (s,h),(s',h’) Fe Q for all s,h and s',h’, and 
— lowse(P, s) C lows,(Q, s) for all s 


The security level Z is used not just in the evaluation of the assertions but also 
to preserve the ¢-attacker visible locations of P in Q. This reflects the intuition 
that P is stronger than Q, and so Q should make fewer assumptions than P on 
the limitations of an attacker’s observational powers. 
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Proposition 1. 


e=e Ae =e! Nene => ee! (14) 
exnepAe, Ce! Ae! 2 = e:e (15) 

eae:l => cze for a constant c (16) 

ey ep AN Aene => fle... en) 21 forn>0 (17) 

ep £ ene DL => ep trey A €p neq A € £ € (18) 

(V s. lowse(P, s) = lowse(Q, s)) implies ¢6\(¢? P : Q) 6P (19) 
P= P' andQ & Q implies PxQ 5 P'xQ' (20) 


Entailment (14) in Proposition 1 shows that sensitivity of values is compatible 
with equality. This property fails in the security separation logic of [14], where 
labels are part of the semantics of expressions but are not compared by equality. 
The second property (15) captures the intuition that less-sensitive data can 
always be used in contexts where more-sensitive data might be expected (but not 
vice-versa). Recall that e;’ here is an expression. The additional condition e;' :: £ 
guarantees that this expression denotes a meaningful security level, i.e. evaluates 
identically in both states (cf. (7)). (abusing notation to let the semantic £ stand 
for some expression that denotes it). Property (16) encodes that constants do not 
depend on any state; again the security level expression e; must be meaningful, 
but trivially c:: € when £ is constant, too. Value sensitivity is congruent with 
function application (17). This is not surprising, as functions map arguments 
equal in both states to equal results. Yet, as with (14) above, this property 
fails in [14] where security labels are attached to values. Note that the reverse 
entailment is false (e.g. for the constant function Ax.c). 

Via (18), when ep > e, it follows that both the location ep and the value ey 
adhere to the level ez, cf. (9). Note that the antecedent ep > e, is repeated in 
the consequent to ensure that the set of ¢-attacker visible locations is preserved. 
Conditional assertions can be resolved when the test is definite, provided that P 
and Q describe the same set of public locations, (19) and symmetrically for 7¢. 
Finally, separating conjunction is monotone wrt. entailment (20). 


3.3 Proof System 


We consider a canonical concurrent programming language with shared heap 
locations protected by locks but without shared variables. Commands c com- 
prise assignments to local variables, heap access (load and store),! sequential 
programming constructs, as well as parallel composition and locking. We assume 


1 Volatile memory locations can be treated analogously to locks by introducing an 
additional assertion characterizing that part of the heap, that is implicitly available 
to atomic commands. This feature is realized in the Isabelle theories [18] but omitted 
here in the interests of brevity. 
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a static collection of valid lock identifiers l, each of which has an assertion as its 
associated invariant inv(l), characterizing the protected portion of the heap. We 
describe the program semantics in Sect. 4 as part of the soundness proof. 


c= g :=e |x := [ey] | [ep] := ev | lock l | unlock | 


C1; C2 | C1 || c2 | if b then cı else c2 | while b doc 


The SECCSL proof rules are shown in Fig.4. They extend the standard 
rules of concurrent separation logic [38] (CSL) by additional side-conditions that 
amount to information flow checks e:: _ as part of the respective preconditions. 

Similarly to [46], without loss of generality we require that assignments (rules 
ASG, LOAD) are always to distinct variables, to avoid renaming in the assertions. 
In the postcondition of LOAD, x :: e; can be derived by CONSEQ for (18). Stor- 
ing to a heap location through an e)-sensitive location ep “+ e, (rule STORE) 
requires that the value e, written to that location admits the corresponding secu- 
rity level e; of the location ep. Note that due to monotonicity (15) the security 
level does not have to match exactly. The rules for locking are standard [12]. To 
preclude information leaks through timing channels, the execution can branch 
on non-secret values only. This manifests in side conditions b::£ for the respective 
branching condition b where, recall, Z is the attacker security level (IF, WHILE). 
Logical SPLIT picks those two cases where [e]. = [@] 5’, ruling out the other two 
by @ :: £. The consequence rule (CONSEQ) uses entailment relative to £ (Defini- 
tion 1). Rule PAR has the usual proviso that the variables modified in one thread 
cannot interfere with those relied on by the other and its pre-/postcondition. 


4 Security Definition and Soundness 


The soundness theorem for SECCSL guarantees that if some triple £H {P} c {Q} 
is derived using the rules of Fig. 4, then: all executions of c started in a state 
satisfying precondition P are memory safe, partially correct with respect to 
postcondition Q, and moreover secure with respect to the sensitivity of values 
as denoted by P and Q and at all times respect the sensitivity of locations as 
denoted by P (see Sect.2.3). Proof outlines are relegated to Appendix B. All 
results have been mechanised in Isabelle/HOL [37] and are available at [18]. 

The top-level security property of SECCSL is a noninterference condi- 
tion [19]. Noninterference as a security property specifies, roughly, that for any 
pair of executions that start in states that agree on the values of all attacker- 
observable inputs, then, from the attacker’s point of view the resulting executions 
will be indistinguishable, i.e. all of the attacker visible observations will agree. 
In SECCSL, what is “attacker-observable” depends on the attacker level 4. The 
“inputs” are the expressions e, and the attacker-visible inputs are those expres- 
sions e whose sensitivity is given by e:: ¢’ judgements in the precondition P for 
which ¢’ C £. The attacker-visible observations are the contents of all memory 
locations in lows,(P,s), for initial store s and precondition P. Thus we define 
when two heaps are indistinguishable to the attacker. 
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x ¢ free(e) x ¢ free(€p, ev, €) 
ASG e e LOAD 
LE {emp} x := e {x = e} Lb {ep > ey} x := [ep] {£ = ev A ep > ev} 
z 7 STORE 
LE {ey er ^ep _} [ep] := ev {ep > ev} 
LH {emp} lock l {inv(l)} LocK £+ {inv(l)} unlock l {emp} YNLOGGE 
LE {bAP}c {Q} LE {9AP} c {Q} 
LE {bA P} c {Q} LE {Ag A P} c {Q} 
LF [bLA P} ifb then cı else ca {Q} " IF {p-InP} {Q} T 
LE {bA bz LAP} c {bu lA P} 
LF (b: 4A P} while bdo c (bA P} VTE 
modified (c) N free(F) = Ø 
£E {P}c, {R} LE {R} co {Q} 5 L- {P} c {Q} 
LF {P} cao {Q} EQ fe {Pe Pieioary C PAME 
PSP 
ISR modified (ci) N free(c;, Pj, Qj) = Ø for i £ j 
LE {P’} ¢ {Q’} c LF {Py} C1 {Pi} LH {P2} C2 {P2} 
ir {(Pye{aqy ~OSre OF {Pi * Pal 1 ce (Qi * Qa} aa 


Fig. 4. Proof rules of SECCSL. 


Definition 2 (¢ Equivalence). Two heaps coincide on a set of locations A, 
written h =, h’, iff for alla € A. a € dom (h) N dom (h’) and h(a) = h’(a). 
Two heaps h and h’ are ¢-equivalent wrt. store s and assertion P, ifh =a h’ for 
A = lows,(P, s). 


Then, the ¢validity of an assertion P in the relational semantics witnesses 4- 
equivalence between the corresponding heaps. 
Lemma 1. [f (s,h),(s’,h’) Ee P, then h =4 h’ for A = lows;,(P,s). 


Furthermore, if (s,h),(s’,h’) Ee P, then lowse(P,s) = lows¢(P,s’) since the 
security levels in labeled points-to predicates must coincide on s and s’, cf. (9). 


Semantics. Semantic configurations, denoted by k in the following, are one of 
three kinds: (run c, L,s,h) denotes a command c in a state s,h where L is a 
set of locks that are currently not held by any thread and can be acquired by c; 
(stop L, s, h) similarly denotes a final state s, h with residual locks L, and abort 
results from invalid heap access. 

The single-step relation (run c, L,s,h) + k takes running configurations 
to successors k with respect to a schedule o that resolves the non-determinism 
of parallel composition. The schedule ø is a list of actions: the action (7) rep- 
resents the execution of atomic commands and the evaluation of conditionals; 
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the actions (1) and (2) respectively denote the execution of the left- and right- 
hand sides of a parallel composition for a single step, and so define a deter- 


ministic scheduling discipline reminiscent of separation kernels [32]. For exam- 


ple, (run cı || c2, L,s,h) oY (run c || co, L’,s',h’) if (rum c1, L,s,h) > 


(run c), L’,s’,h’). Configurations (run lock l, L,s,h) can only be scheduled if 
l€ L (symmetrically for unlock)) and otherwise block without a possible step. 
Executions kı —— 7 * kn+1 chain several steps k; 2, ki+ı by accumulating 
the schedule. We are considering partial correctness only, thus the schedule is 
always finite and so are all executions. The rules for program steps are otherwise 


standard and can be found in Appendix A. 


Compositional Security. To prove that SECCSL establishes its top-level non- 
interference condition, we first define a compositional security condition that 
provides the central characterization of security for a command c with respect 
to precondition P and postcondition Q. That central, compositional property we 
denote secure? (P, c, Q) and formalize below in Definition 3. It ensures that the 
first n steps (or fewer if the program terminates before that) are safe and preserve 
é-equivalence of the heap locations specified initially in P, but in a way that is 
compositional across multiple execution steps, across multiple threads of execu- 
tion and across different parts of the heap. It is somewhat akin, although more 
precise than, prior characterizations based on strong low bisimulation [16,45]. 

Disregarding the case when c terminates before the n-th step for a moment, 
for a pair of initial states (s1,h1) and (s{,h4) and initial set of locks Lı, and 
a fixed schedule o = o1:--On, secure?! (Pi, c1, Q) requires that c performs a 
sequence of lockstep execution steps from each initial state 


Oi . 
(run c;, Li, si, hi) —> (run ci41, Liga, 8:41, hiti) forl<i<n 


Oi 


(run Ci; Li, s}, hi) —- (run Ci+1; Lipis CALE hea) (21) 


These executions must agree on the intermediate commands c; and locks L; 
and the ith pair of states must satisfy an intermediate assertion of the following 
form: 


(si, hi), (85,24) He Pi x F x invs(L;) where invs(L;) = X; ez; inv(li) (22) 


i? 


Here P; describes the part of the heap that command c; is currently accessing. 
invs(L,;) is the set of lock invariants for the locks l; € L; not currently acquired. 
Its presence ensures that whenever a lock is acquired that the associated invariant 
can be assumed to hold. Finally F is an arbitrary frame, an assertion that does 
not mention variables updated by c;. Its inclusion allows the security property 
to compose with respect to different parts of the heap. 

Moreover, each P;,, x invs(L;41) is required to preserve the sensitivity of all 
é-visible heap locations of P; x invs(L;), i.e. so that lowse(P; x invs(L;),8;) C 
lows¢( P41 * invs(Li+41), $i41). If some intermediate step m < n terminates, then 
Pm+1 = Q, ensuring the postcondition holds when the executions terminate. 
Lastly, neither execution is allowed to reach an abort configuration. 
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If the initial state satisfies P, x F x invs( L1) then (22) holds throughout the 
entire execution, and establishes the end-to-end property that any final state 
indeed satisfies the postcondition and that lows,(P, x invs(L1), s1) C lowse(P; x 
invs(L,;), si) with respect to the initially specified low locations. 

The property secure? (P, c, Q) is defined recursively to match the steps of the 
lockstep execution of the program. 


Definition 3 (Security). 


- secure? (P1,c1,Q) holds always. 

— secure? T'(Pi,c1,Q) holds, iff for all pairs of states (si,h1), (s{,h4), 
frames F, and sets of locks Ly, such that (sı, hı), (s4, hi) =e Pi x 
F x invs(Lı), and given two steps (run cı, L1,sı,hı) — k and 


(run c1, L1,s1, h1) = k' there exists an assertion Py and a pair of suc- 
cessor states with either of 
e k = (stop L2,s2,h2) and k’ = (stop L2,s5, h3) and Pz =Q 
ek = (run c L2,s2,h2) and k! = (run co, L2,85,h5) with 
secure? (P2, c2, Q) 
such that (s2,h2),(s5,hh) He Po * F x invs(L2) and lowsọ(P) x 
invs( L1), s1) C lowse(P2 x invs( L2), s2) in both cases. 


Two further side condition are imposed, ensuring all mutable shared state lies in 
the heap (cf. Sect. 3): cı doesn’t modify variables occurring in invs(L1) and F 
(which guarantees that both remain intact), and the free variables in P) can 
only mention those already present in P}, cı, or in any lock invariant (which 
guarantees that P> remains stable against concurrent assignments). Note that 
each step can pick a different frame F', as required for the soundness of rule PAR. 


Lemma 2. LF {P} c {Q} implies secure} (P, c, Q) for every n > 0. 


Safety, Correctness and Noninterference. Execution safety and correctness with 
respect to pre- and postcondition follow straightforwardly from Lemma 2. 


Corollary 1 (Safety). Given initial states (s1, hı), (s1, h4) He P x invs(L1) 
and two executions of a command c under the same schedule to resulting configu- 
rations k and k’ respectively, then L- {P} c {Q} implies k + abortAk’ 4 abort. 


Theorem 1 (Correctness). For initial states (sı, hı), (s1, k1) =e P x 
invs(L1), given two complete executions of a command c under the same sched- 
ule o 


(run c, L1, s1,h1) —>* (stop Le, 82, h2) 


(run ci, Li, s4, hi) = (stop L2, 85, hy) 


then L- {P} c {Q} implies (s2, ha), (s5, h3) He Q * invs(L2). 


The top-level noninterference property also follows from Lemma 2 via Lemma 
1. For brevity, we state the noninterference property directly in the theorem: 
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Theorem 2 (Noninterference). Given a command c, and initial states 
(s1, h1), (s1, h1) Ee P x invs(L1) then LH {P} c {Q} implies h; =4 hj, where 
A = lows(P x invs(L1), sı), for all pairs of heaps h; and hi, arising from exe- 
cuting the same schedule from each initial state. 


5 SEcC: Automating SECCSL 


To demonstrate the ease by which SECCSL can be automated, we develop the 
prototype tool SECC, available at [18]. It implements the logic from Sect. 3 for a 
subset of C. SECC is currently used to explore reasoning about example programs 
with interesting security policies. Thus its engineering has focused on features 
related to security reasoning (e.g. deciding when conditions e :: e; are entailed) 
rather than reasoning about complex data structures. 


Symbolic Execution. SECC automates SECCSL through symbolic execution, as 
pioneered for SL in [7]. Similarly to VeriFast’s algorithm in [22], the verifier com- 
putes the strongest postcondition of a command c when executed in a symbolic 
state, yielding a set of possible final symbolic states. Each such state o = (p, s, B) 
maintains a path condition p of relational formulas (from procedure contracts, 
invariants, and the evaluation of conditionals) and a symbolic heap described by 
a list P = (Pı x---* Pa) of atomic spatial assertions (points-to and instances 
of defined predicates). The symbolic store s maps program variables to pure 
expressions, where s(e) denotes substituting s into e. As an example, when 
P; = s(€p)  v is part of the symbolic heap, a load x := ep in o can be 
executed to yield the updated state (p, s(x := v), P) where x is mapped to v. 
To find the P; we match the left-hand sides of points-to predicates. Simi- 


larly, matching is used during checking of entailments pı A P — J £. Pp ^Q, 
where the conclusion is normalized to prenex form. The entailment is reduced 
to a non-spatial problem by incrementally computing a substitution 7 for the 
existentials z, removing pairs P; = 7(Q;) in the process, as justified by (20) (see 
also “subtraction rules” in [7, Sec. 4]). 

Finally, the remaining relational problem pı = p2 without spatial connectives 
can be encoded into first-order [17], by duplicating the pure formulas in terms 
of fresh variables to represent the second state, and by the syntactic equivalent 
of (7). The resulting verification condition is discharged with Z3 [15]. This trans- 
lation is semantically complete. For example, consider Fig. 4 from Prabawa et 
al. [43]. It has a conditional if (b == b) ..., whose check (b = b):: low, translated 
to (b = b) = (b' = b') by SECC, holds independently of b’s sensitivity. 


Features. In addition to the logic from Sect. 3, SECC supports procedure mod- 
ular verification with pre-/postconditions as usual; and it supports user-defined 
spatial predicates. While some issues of the C source language are not addressed 
(yet), such as integer overflow, those that impact directly on information flow 
security are taken into account. Specifically, the shortcut semantics of boolean 
operators && ||, and ternary _ ? _ : _ count as branching points and as such 
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the left hand side resp. the test must not depend on sensitive data, similarly to 
the conditions of if statements and while loops. 

A direct benefit of the integration of security levels into the assertion lan- 
guage is that it becomes possible to specify the sensitivity of data passed to 
library and operating system functions. For example, the execution time of 
malloc(len) would depend on the value of len, which can thus be required 
to satisfy len :: low by annotating its function header with an appropriate pre- 
condition, using SECC’s requires annotation. Likewise, SECC can reason about 
limited forms of declassification, in which external functions are trusted to safely 
release otherwise sensitive data, by giving them appropriate pre- /postconditions. 
For example, a password hashing library function prototype might be annotated 
with a postcondition asserting its result is low, via SECC’s ensures annotation. 


Examples and Case Study. SECC proves Fig. 1 secure, and correctly flags buggy 
variants as insecure, e.g., where the test in thread 1 is reversed, or when thread 2 
does not clear the data field upon setting the is_classified to FALSE. SECC also 
correctly analyzes those 7 examples from |17] that are supported by the logic 
and tool (each in ~10ms). All examples are available at [18]. 

To compare SECC and SECCSL against the recent COVERN logic [34], we 
took a non-trivial example program that Murray et al. verified in COVERN, man- 
ually translated it to C, and verified it automatically using SECC. The original 
program’, written in COVERN’s tiny While language embedded in Isabelle /HOL, 
models the software functionality of a simplified implementation of the Cross 
Domain Desktop Compositor (CDDC) [5]. The CDDC is a device that facili- 
tates interactions with multiple PCs, each of which runs applications at differing 
sensitivity, from a single keyboard, mouse and display. Its multi-threaded soft- 
ware handles routing of keyboard input to the appropriate PC and switching 
between the PCs via mouse gestures. Verifying the C translation required adding 
SECCSL annotations for procedure pre-/postconditions and loop invariants. The 
C translation including those annotations is ~250 lines in length. The present, 
unoptimised, implementation of SECC verifies the resulting artifact in ~5s. In 
contrast, the COVERN proof of this example requires ~600 lines of Isabelle/HOL 
definitions /specification, plus ~550 lines of Isabelle proof script. 


6 Related Work 


There has been much work targeting type systems and program logics for con- 
current information flow. Karbyshev et al. [23] provide an excellent overview. 
Here we concentrate on work whose ideas are most closely related to SECCSL. 

Costanzo and Shao [14] propose a sequential separation logic for reasoning 
about information flow. Unlike SECCSL, theirs does not distinguish value and 
location sensitivity. Their separation logic assertions have a fairly standard (non- 
relational) semantics, at the price of having a security-aware language semantics 


? https: //bitbucket.org/covern/covern/src/master /examples/cddc/Example_CDDC_ 
WhileLockLanguage.thy. 
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that propagates security labels attached to values in the store and heap. As 
mentioned in Sect. 3.2, this has the unfortunate side-effect of breaking intuitive 
properties about sensitivity assertions. We conjecture that the absence of such 
properties would make their logic harder to automate than SECCSL, which 
SECC demonstrates is feasible. SECCSL avoids the aforementioned drawbacks 
by adopting a relational assertion semantics. 

Gruetter and Murray [20] propose a security separation logic in Coq [8] for 
Verifiable C, the C subset of the Verified Software Toolchain [2,3]. However they 
provide no soundness proof for its rules and its feasibility to automate is unclear. 

Two recent compositional logics for concurrent information flow are the Cov- 
ERN logic [34] and the type and effect system of Karbyshev et al. [23]. Both 
borrow ideas from separation logic. However, unlike SECCSL, neither is defined 
for languages with pointers, arrays etc. 

Like SECCSL, COVERN proves a timing-sensitive security property. Location 
sensitivity is defined statically by value-dependent predicates, and value sensi- 
tivity is tracked by a dependent security typing context IT [35], relative to a 
Hoare logic predicate P over the entire shared memory. In COVERN locks carry 
non-relational invariants. In contrast, SECCSL unifies these elements together 
into separation logic assertions with a relational semantics. Doing so leads to a 
much simpler logic, amenable to automation, while supporting pointers, etc. 

On the other hand, Karbyshev et al. [23] prove a timing-insensitive security 
property, but rely on primitives to interact with the scheduler to prevent leaks via 
scheduling decisions. Unlike SECCSL, which assumes a deterministic scheduling 
discipline, Karbyshev et al. support a wider class of scheduling policies. Their sys- 
tem tracks resource ownership and transfer between threads at synchronisation 
points, similar to CSLs. Their resources include labelled scheduler resources that 
account for scheduler interaction, including when scheduling decisions become 
tainted by secret data—something that cannot occur in SECCSL’s deterministic 
scheduling model. 

Prior logics for sequential languages, e.g. [1,4], have also adopted separa- 
tion logic ideas to reason locally about memory, combining them with relational 
assertions similar to SECCSL’s e :: e; assertions. For instance, the agreement 
assertions A(e) of [4] coincide with SECCSL’s e:: low. Unlike SECCSL, some of 
these logics support languages with explicit declassification actions [4]. 

Self-composition is another technique to exploit existing verification infras- 
tructure for proofs of general hyperproperties [13], including but not limited to 
non-interference. Eilers et al. [17] present such an approach for Viper, which 
supports an assertion language similar to that of separation logic. It does not 
support public heap locations (which are information sources and sinks at the 
same time) albeit sinks can be modeled via preconditions of procedures. A similar 
approach is implemented in Frama-C [9]. Both of [9,17] do not support concur- 
rency, and it remains unclear how self-composition could avoid an exponential 
blow-up from concurrent interleaving, which SECCSL avoids. 

The soundness proof for SECCSL follows the general structure of 
Vafeiadis’ [46] for CSL, which is also mechanised in Isabelle/HOL. There is, 
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however, a technical difference: His analog of Definition 3, a recursive predicate 
called safe,,(c, s, h, Q), refers to a semantic initial state s, h whereas we propagate 
a syntactic assertion (22) only. Our formulation has the benefit that some of the 
technical reasoning in the soundness proof is easier to automate. Its drawback is 
the need to impose technical side-conditions on the free variables of the frame F 
and the intermediate assertions P;. 


7 Conclusion 


We presented SECCSL, a concurrent separation logic for proving expressive data- 
dependent information flow properties of programs. SECCSL is considerably 
simpler, yet handles features like pointers, arrays etc., which are out of scope for 
contemporary logics. It inherits the structure of traditional concurrent separation 
logics, and so like those logics can be automated via symbolic execution [10, 
22,30]. To demonstrate this, we implemented SECC, an automatic verifier for 
expressive information flow security for a subset of the C language. 

Separation logic has proved to be a remarkably powerful vehicle for reason- 
ing about programs, weak memory concurrency [47], program synthesis [42], and 
many other domains. With SECCSL, we hope that in future the same possibili- 
ties might be opened to verified information flow security. 
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A Command Semantics 


Symmetric parallel rules in which cz is scheduled under the action (2) omitted. 


s' = s(z = lels) [els ¢ dom (h) 

(run z := e, L,s,h) m, (stop L, s’,h) (run x := [e], L, s, h) ©, abort 
[e]s € dom (h) s = s(x => h([e]s)) [ei]. ¢ dom (A) 
(run z := [e], L,s,h) È (stop L,s',h) (run [e1] := ez, L, s, h) <4 abort 


fei], € dom (h) R’ = h(fei]s [ee] s) 


(run [e1] := e2, L, s, h) aa, (stop L,s,h') 


fen. Peay 


(run lock l, L, s, h) 2 (stop L’, s,h) 
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lL L'=LU{}]} (run cı, L, s, h) — abort 
(run unlock l, L, s, h) Bins (stop L’,s,h) (run c1; c2, L,s,h) > abort 
(run c,L,s,h) + abort (run c,,L,s,h) = (stop L’,s’,h’) 
(run c; || c2, L, s, h) are, abort (run c1; c2, L, s, h) = (run c2, L’, s',h’) 


(run c1, L, s, h) = (run cj, L’, s’,h’) 


(run c1; c2, L, s, h) = (run c4; c2, L’, 3', h’) 


(run c, L, s, h) = (stop L’, s’, h’) 


(run cı || co, L, s, h) RCA (run c2, L',s', h’) 


(run c, L, s, h) —> (run c, L’, s',h’) 


(run cı || c2, L, s, h) £5 (run cf || co, L’, 8’, h’) 


if s Eb then œ = cı else d = co 


(run if b then cı else co, L, s, h) w, (run c’, L,s,h) 


s b 


(run while b do c, L, s, h) LL (stop L,s,h) 


s=b 


(run while b do c, L, s, h) a, (run (c;w), L, s, h) 
eS 


Ww 


k Li k k Ly k" 


k*k k ÆR * p" 


B Proofs 


Proof of Lemma 1 


If (5, h), (s',h’) He P, then h £ h’ for A = lowse(P, 8). 


Proof. By induction on the structure of P, noting that lows¢(__, s) contains loca- 
tions of the corresponding sub-heap only. 
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Proof of Lemma 2 
LE {P} c {Q} implies secure? (P, c, Q) for every n > 0. 


Proof (Outline). By induction on the derivation of the validity of the judgement. 
Noting that n = 0 is trivial, we may unfold the recursion of the security definition 
once to prove the base cases of assignment, load, store, and locking, which then 
follow from the respective side conditions of the proof rules. 

For rules IF and WHILE, the side condition b:: £ guarantees that the test 
evaluates equivalently in the two states and thus execution proceeds with the 
same remainder program. 

Except for IF, all remaining rules need a second induction on n to stepwise 
match security of the premise to security of the conclusion (e.g. over the steps 
of the first command in a sequential composition c1; c2). 

The rule FRAME instantiates the frame F with the same assertion in each 
step, whereas PAR uses the frame F to preserve the current precondition P> of c2 
over steps of cı and vice-versa. 


Proof of Corollary 1 


Given a command c and initial states (s1, h1), (s1, h4) He P x invs(L1) and 
two executions under the same schedule to resulting configurations k and k’ 
respectively, then £F {P} c {Q} implies k 4 abort ^ k’ # abort. 


Proof. By induction on the number of steps n of the executions from 
secure? (P, c, Q) via Lemma 2. 


Proof of Theorem 1 


Given a command c and initial states (s1, h1), (s1, h1) He P x invs(L1) and two 
complete executions under the same schedule o 


(run c, L1, s1, h1) >" (stop Lo, 82, hz) 
(run c;, Li, sl, h.) 6* (stop Lo, sh, hh) 
then £F- {P} c {Q} implies (s2, ha), (s5, R3) Ee Q * invs( L3). 


Proof. By induction on the number of steps n of the executions from 
secure} (P, c, Q) via Lemma 2. 


Proof of Theorem 2 
Given a command c, and initial states (s1, h1), (s1, R1) =e P x invs(L1) then 


LHP} c {Q} implies h; á h, where A = lows¢(P, sı), for all pairs of heaps h; 
and h; arising from executing the same schedule from each initial state. 


Proof. By induction on the number of steps i up to that state from 
secure}(P,c,Q) via Lemma 2 we have lowse(P x invs(L1),s1) C lowse(P; x 
invs( L1), s;) transitively over the prefix, where P; and s; are from the i-th state. 
The theorem then follows from Lemma 1 in Sect. 3.1. 
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Abstract. Cloud services provide the ability to provision virtual net- 
worked infrastructure on demand over the Internet. The rapid growth 
of these virtually provisioned cloud networks has increased the demand 
for automated reasoning tools capable of identifying misconfigurations 
or security vulnerabilities. This type of automation gives customers the 
assurance they need to deploy sensitive workloads. It can also reduce 
the cost and time-to-market for regulated customers looking to establish 
compliance certification for cloud-based applications. In this industrial 
case-study, we describe a new network reachability reasoning tool, called 
TIROS, that uses off-the-shelf automated theorem proving tools to fill this 
need. TIROS is the foundation of a recently introduced network security 
analysis feature in the Amazon Inspector service now available to millions 
of customers building applications in the cloud. TIROS is also used within 
Amazon Web Services (AWS) to automate the checking of compliance 
certification and adherence to security invariants for many AWS services 
that build on existing AWS networking features. 


1 Introduction 


Cloud computing provides on-demand access to IT resources such as compute, 
storage, and analytics via the Internet with pay-as-you-go pricing. Each of these 
IT resources are typically networked together by customers, using a growing 
number of virtual networking features. Amazon Web Services (AWS), for exam- 
ple, today provides over 30 virtualized networking primitives that allow cus- 
tomers to implement a wide variety of cloud-based applications. 

Correctly configured networks are a key part of an organization’s security 
posture. Clearly documented and, more importantly, verifiable network design 
© The Author(s) 2019 
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is important for compliance audits, e.g. the Payment Card Industry Data Secu- 
rity Standard (PCI DSS) [10]. As the scale and diversity of cloud-based services 
grows, each new offering used by an organization adds another dimension of pos- 
sible interaction at the networking level. Thus, customers and auditors increas- 
ingly need tooling for the security of their networks that is accurate, automated 
and scalable, allowing them to automatically detect violations of their require- 
ments. 

In this industrial case-study, we describe a new tool, called TIROS, which 
uses off-the-shelf automated theorem proving tools to perform formal analysis of 
virtual networks constructed using AWS APIs. TIROS encodes the semantics of 
AWS networking concepts into logic and then uses a variety of reasoning engines 
to verify security-related properties. Tools that TIROS can use include SOUFFLE 
[17], MONOSAT [3], and VAMPIRE [23]. TIROS performs its analysis statically: it 
sends no packets on the customer’s network. This distinction is important. The 
size of many customer networks makes it intractable to find problems through 
traditional network probing or penetration testing. TIROS allows users to gain 
assurance about the security of their networks that would be impossible through 
testing. 

TIROS is used directly today by AWS customers as part of the Amazon 
Inspector service [11], which currently checks six TrRos-based network reach- 
ability invariants on customer networks. The use of TIROS is especially pop- 
ular amongst security-obsessed customers, e.g., the world’s largest hedge fund 
Bridgewater Associates, an AWS customer, recently discussed the importance of 
network verification techniques for their organization [6], including their usage 
of TIROS. 


Related Work. Several previous tools using automated theorem proving have 
been developed in an effort to answer questions about software defined networks 
(SDNs) [1,2,5,12,13,16,19,25]. Similar to our approach, these tools reduce the 
problems to automated reasoning engines. In some cases, they employ over- 
approximative static analysis [18,19]. In other cases, they use general purpose 
reasoning engines such as Datalog [12,15], BDD [1], SMT [5,16], and SAT 
Solvers [2,25]. VeriCon [2], NICE [8], and VeriFlow [19] verify network invari- 
ants by analyzing software-defined-network (SDN) programs, with the former 
two applying formal software verification techniques, and the latter using static 
analysis to split routes into equivalence classes. SecGuru [5,16] uses an SMT 
solver to compare the routes admitted by access control lists (ACLs), routing 
tables, and border gateway protocol (BGP) policies, but does not support full- 
network reachability queries. In our approach we employ multiple encodings and 
reasoning engines. Our SMT encoding is similar in design to Anteater [25] and 
ConfigChecker [1]. Anteater performs SAT-based bounded model checking [4], 
while ConfigChecker uses BDD-based fixed-point model checking [7]. Previous 
work has applied Datalog to reachability analysis in either software or network 
contexts [12—14,24]. The approach used in Batfish [13,24] and SyNET [12] is 
similar to our Datalog approach; they allow users to express general queries 
about whole-network reachability properties using an expressive logic language. 
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Batfish presents results for small but complex routing scenarios, involving a few 
dozen routers. SyNET [12] also uses a similar Datalog representation of network 
reachability semantics, but rather than verifying network reachability properties, 
they provide techniques to synthesize networks from a specification. The focus in 
‘TIROS’s encoding is expressiveness and completeness; it encodes the semantics of 
the entire AWS cloud network service stack. It scales well to networks consisting 
of hundreds of thousands of instances, routers, and firewall rules. 


2 AWS Networking 


AWS provides customers with virtualized implementations of practically all 
known traditional networking concepts, e.g. subnets, route tables, and NAT 
gateways. In order to facilitate on-demand scalability, many AWS network fea- 
tures focus on elasticity, e.g. Elastic Load Balancers (ELBs) support autoscaling 
groups, which customers configure to describe when/how to scale resource usage. 
Another important AWS networking concept is that of Virtual Private Cloud 
(VPC), in which customers can use AWS resources in an isolated virtual net- 
work that they control. Over 30 additional networking concepts are supported 
by AWS, including Elastic Network Interfaces (ENIs), internet gateways, transit 
gateways, direct connections, and peering connections. 

Figure1 is an example AWS-based network that consists of two subnets 
“Web” and “Database”. The “Web” subnet contains two instances (sometimes 
called virtual machines) and the “Database” subnet contains one instance. Note 
that these machines are in fact virtualized in the AWS data center. The “Web” 
subnet’s route table has a route to the internet gateway, whereas the “Database” 
subnet’s route table only has local routes (within the VPC). In addition, each 
of the subnets has an ACL that contains security access rules. In particular, one 
of the rules forbids SSH access to the database servers. 
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Fig. 1. An example VPC network 
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AWS-based networks frequently start small and grow over time, accumulating 
new instances and security and access rules. Customers or regulators want to 
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make sure that their VPC networks retain security invariants as their complexity 
grows. A customer may ask network configuration questions such as: 


1. “Are there any instances in subnet ‘Web’ that are tagged ‘Bastion’ ?” or net- 
work reachability questions such as: 

2. “Are there any instances that can be accessed from the public internet over 
SSH (TCP port 22)?” 


To answer such questions we must reason about which network components are 
accessible via feasible paths through the VPC, either from the internet, from 
other components in the VPC, or from other components in a different VPC via 
a peering connection or transit gateway. 


3 AWS Networking Semantics as Logic 


TIROS statically builds a model of an AWS network architecture to check reach- 
ability properties. The model of the network consists of two parts, the formal 
specification and the snapshot of the network. The specification formalizes the 
semantics of the AWS networking components, e.g., how a route table directs 
traffic from a subnet, in which order a firewall applies rules in a security group, 
and how load balancers route traffic. The snapshot describes the topology and 
details of the network. For example, the snapshot contains the list of instances, 
subnets, and their route tables in a particular VPC (or set of VPCs). To answer 
reachability questions, TIROS combines the formal specification, the snapshot, 
and a query into a formula that represents the answer. Tiros uses up to three 
reasoning engines to answer queries: the Datalog solver SOUFFLE [17], the SMT 
solver MONOSAT [3], or the first-order theorem prover VAMPIRE [23]. Due to 
the differing limitations and capabilities of each of these tools, we maintain three 
independent encodings of network semantics into logic, one for each of solver. 


Datalog Encoding. In the Datalog encoding, a network model is a set of Datalog 
clauses (stratified, possibly recursive or negated Horn clauses without function 
symbols) using the theory of bit vectors to describe ports, IPv4 addresses, and 
subnet masks. The specification part of the network model contains types, pred- 
icates, constants, and rules that describe the semantics of the networking com- 
ponents in Amazon VPCs. The specification of Amazon VPC networks maps to 
approximately 50 types, 200 predicates, and over 240 rules. For example, a spec- 
ification of the semantics of SSH tunneling is defined recursively: An instance 
can SSH tunnel to another instance iff it can either SSH to it directly, or through 
a chain of intermediate instances. We express this with predicates canSsh Tunnel 
and canSsh, of the type Instance x Instance, and rules: 


canSsh Tunnel (I, I2) — canSsh(Q, I2). 
canSsh Tunnel (I, I2) — canSshTunnel(Iy, I3) A canSshTunnel (Iz, Iz). 


The snapshot part of the network model contains constants and facts (ground 
clauses with no antecedents) that describe the configuration of a specific AWS 
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network. Constants have the form type,,. For example, the snapshot of a network 
with an instance with id 1234 in a subnet with id web consists of the constants 
instance,934 and subnetwep, and the fact hasSubnet(instance,234, subnetwep). 
We illustrate the Datalog encoding using examples from Sect. 2. The network 
configuration question, q(T), is encoded as q(I) — hasSubnet(I,subnetwep) A 
hasTag(I, tagpastion): The network reachability question, r(I, Æ), is encoded as: 


r(I, E) — hasEni(I, E) ^ isPublicI P( Address) ^ 
reachPublicT cp Udp (diringress, protog, Æ, portos, Adress, port4o000)- 


In our Datalog encoding, we use the theory of bitvectors to reason about 
ports, IP addresses, and CIDRs. We use SOUFFLE as our Datalog solver, but in 
principle other Datalog solvers could also be used, so long as they also support 
bitvectors. We direct the reader to our co-author’s dissertation (cf. Chapter 
7 [28]) for a more detailed explanation of the Datalog encoding. 


= j= ~~ Intermet 
f \ protocol bv:8 


— A ED srcAdr bv:32 


a vec iwi dstAdr bv:32 
— = ieee) srcPort bv:16 


Ribdatabase dstPort bv:16 


Fig. 2. (Left) The symbolic graph corresponding to the VPC in Fig. 1. (Right) A 
simplified symbolic packet, composed of bitvectors. 


SMT Encoding. Our SMT encoding models network reachability as a symbolic 
graph of network components, along with one or more symbolic packet headers 
consisting of bitvectors for the source and destination addresses and ports. A 
symbolic graph consists of a set of nodes and directed edges, where the edges 
may be traversable or untraversable. Predicate edge(u,v), where u and v are 
nodes, is true iff the corresponding edge is traversable. The assignment of the 
edge(u,v) atoms in the formula determines which paths exist in the graph. 

Figure 2 shows a symbolic graph corresponding to the VPC from Fig. 1. In our 
encoding, nodes represent networking components (such as instances, network 
interfaces, subnets, route tables, or gateways), and edges represent possible paths 
that packets may take between those components (such as between an instance 
and its network interface). Constraints between edge atoms and bitvectors in 
the packet headers define the routes that a packet can take. 

For example, our encoding introduces an edge between each network interface 
node, Eni-a, and its containing Subnet-web node, edge(Eni-a, Subnet-web). As 
shown in Fig. 3, we also introduce constraints that force edge(Eni-a, Subnet-web) 
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to be false if the packet’s source address does not match the ENI’s IP address. 
This ensures that packets leaving the ENI must have that ENI’s IP address as 
their source address. Similar constraints ensure that packets entering the ENI 
must have that ENI’s IP address as their destination address. 

We encode reachability constraints into this graph using the SMT solver 
MonoSAT [3], which supports a theory of finite graph reachability. Specifically, 
we add a start and end node to the graph, with edges to the source components 
of the query and from the destination components of the query, and then we 
enforce a graph reachability constraint reaches(start, end), which is true iff there 
is a start-end path under assignment to the edge literals. To encode the query 
“Are there any instances that can be accessed from the public internet over 
SSH?”, we would add an edge from the start node to the internet, and from each 
EC2 instance to the end node. Additionally, we would add bitvector constraints 
forcing the protocol of the symbolic packet to be exactly 6 (TCP), and the 
destination port to be exactly 22. 


4 edge(Subnet-web, Eni-a) protocol bv:8 

( ) srcAdr  bv:32 

5 edge(Eni-a, Subnet-web) Sa dstAdr bv:32 

i910.0.0.5 g m A srcPort bv:16 

P mia dstPort bv:16 
Eni-a Subnet-web eerie 


((srcAdr # 10.0.0.5) = 7edge(Eni-a, Subnet-web)) 
((dstAdr # 10.0.0.5) = —edge(Subnet-web, Eni-a)) 


Fig. 3. A small portion of the VPC graph, with constraints over the edges between an 
ENI and its subnet enforcing that packets entering or leaving the ENI have that ENI’s 
source or destination address. 


The SMT encoding described above is intended specifically for answering 
network reachability queries, and does not currently take into account other 
properties (such as tags) that would be required to model the more general 
network configuration queries supported by our datalog encoding. 


First-Order Encoding. In our encoding for superposition solvers such as VAM- 
PIRE [23], we translate each network configuration question into a many-sorted 
first order logic problem that is unsatisfiable iff the answer to the question is true, 
and each network reachability question into a FOL problem that only has finite 
models, each corresponding to an answer to the question. For this encoding, 
we assume that network configuration questions have strictly yes/no answers, 
while network reachability questions return lists of solutions. In addition to its 
default saturation mode, VAMPIRE implements a MACE-style [26] finite model 
builder for many-sorted first-order logic [27]. Thus we use VAMPIRE both as a 
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saturation-based theorem prover and a finite model builder, running both modes 
in parallel and recording the result of the fastest successful run. 

Our encoding begins with the same set of facts as were generated from 
the network model by our Datalog encoding, represented here by the symbols 
(A1, Ag,...). From there, we handle network configuration and network reach- 
ability questions differently, with network-configuration encodings optimized 
for proof-by-contradiction, while reachability configurations are optimized for 
model-building. Proof-by-contradiction for yes/no questions is potentially faster 
than model-building, as intermediate variables need not be enumerated. 

We encode a network configuration question y in negated form: Ai A... A 
An => 7y. If VAMPIRE can prove a contradiction in the negated formula, then y 
holds. We encode a network reachability question y into a formula of the form 
Ai A... A Ay A (VZ)(a(Z) > p) => (V2)q(Z), where q is a fresh predicate symbol, 
and Z are free variables of the network question y. Each substitution of Z that 
satisfies q corresponds to a distinct solution to the reachability question. 

Our encoding targets VAMPIRE’s implementation of many-sorted first-order 
logic with equality, extended with the theory of linear integer arithmetic, the 
theory of arrays [22], and the theory of tuples [20]. We encode types, constants, 
and predicates using Clark completion [9]. We direct the reader to our co-author’s 
dissertation (cf. Chapter 5 [21]) for a more detailed explanation of the VAMPIRE 
encoding, including a detailed analysis of the performance trade-offs considered 
in this encoding. 


4 Usage and Performance 


In this section we describe the performance of the various solvers when used by 
TIROS in practice. Recall that our MONOSAT implementation can only answer 
reachability questions, whereas the other implementations also answer more gen- 
eral network configuration questions (such as the examples in Sect. 2). 

In our experiments with VAMPIRE, we found that the first order logic encod- 
ing we used does not scale well. As we were not able to obtain good performance 
from our VAMPIRE-based implementation, in what follows we only present the 
experimental results for MONOSAT and SOUFFLE. We explain the poor perfor- 
mance of the VAMPIRE encoding mainly by the fact that large finite domains, 
routinely used in network specifications, are represented as long clauses coming 
from the domain closure axioms. Saturation theorem provers, including VAM- 
PIRE, have a hard time dealing with such clauses. 


Amazon Inspector. To compare the performance of SOUFFLE and MOoNOSAT 
in the context of the Trros-based Amazon Inspector feature we randomly 
selected 10,000 network snapshots evaluated in December 2018. On these queries 
SOUFFLE required 4.1s in the best-case, 45.1s in the worst case, with 50th- 
percentile runtime of 5.1s and 90th-percentile runtime of 5.5s. MONOSAT 
required 0.8s in the best case, 2.6s in the worst case, with a 50th-percentile 
runtime of 1.39s and 90th-percentile runtime of 1.79s. To give the reader an 
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idea of the relative size of the constraint systems solved, in the smallest case 
our SOUFFLE encoding consisted of 2,856 facts, and the MONOSAT encoding 
consisted of 609 variables, 21 bitvectors, and 2,032 clauses. In the largest case, 
our SOUFFLE encoding consisted of 7517 facts, and the MONOSAT encoding 
consisted of 2,038 variables, 21 bitvectors, and 17,731 clauses. 


Scalability Tests. MONOSAT and SOUFFLE scale to all queries evaluated 
using Amazon Inspector. To help understand the limits of the SOUFFLE and 
MOonoSAT-based backends on larger networks, in Fig. 4 we compare the perfor- 
mance of the solvers on a series of artificially generated networks of increasing 
size, with 100, 1000, 10,000, and 100,000 instances. In each case, the query is 
‘list all open paths from the Internet to any instance in the VPC”. We can 
see from the figure that neither approach dominates. In most cases the Data- 
log encoding is able to scale to 10,000 instances, but in no cases can it scale 
to 100,000 instances. In most cases the SMT encoding is able to scale to net- 
works with 100,000 instances, but for the ‘benchmark-2’ networks, MONOSAT 
requires almost a full hour to solve the 10,000 instance network that SOUFFLE 
solves in 81s. The SMT encoding performs poorly on ‘benchmark-2’ because 
that benchmark has a vast number of distinct feasible paths through the net- 
work, each requiring a separate SMT solver call. Other benchmarks have fewer 
distinct paths. 
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Fig. 4. Comparison of runtime in seconds for the different solver backends. Each bench- 
mark uses a different color, e.g. SOUFFLÉ on benchmark-1 is a solid blue line, and 
MonoSAT on benchmark-1 is a dashed blue line. In these experiments, SOUFFLÉ 
recompiles each query before solving it, which adds ~ 45s to the runtime of each 
SOUFFLE query. In practice this cost can be amortized by caching compiled queries. 
(Color figure online) 


Automating PCI Compliance Auditing. Many AWS services are built using other 
AWS services, e.g. AWS Lambda is built using AWS EC2 and the various AWS 
networking features. Thus within AWS we are using TIROS to prove the cor- 
rectness of our own internal requirements. As an example, we use TIROS to 
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partially automate evidence generation for compliance audits of Payment Card 
Industry Data Security Standard (PCI DSS) [10]. TrRos is used across the many 
customer-facing AWS services that are built using AWS networking to establish 
controls supporting PCI DSS requirements 1.2, 1.3.1, 1.3.2, 1.3.4, and 1.3.7a. 


Custom Application. AWS’s Professional Services team works with some of the 
most security-obsessed customers to use advanced tools such as TIROS to achieve 
custom-tailored solutions. For example, as discussed in a public lecture [6], 
Bridgewater Associates worked with AWS Professional Services to build a TIROS- 
based solution which proves invariants of new AWS-based network designs before 
they are deployed in Bridgewater’s AWS environment. Proof of these invariants 
assures the absence of possible data exfiltration paths that could be leveraged 
by an adversary. 


5 Conclusion 


We have described the first complete formalization of AWS networking semantics 
into logic. For customers of AWS services, TIROS provides deep insights into AWS 
networking. Via the incorporation of TIROS into the Amazon Inspector service, 
millions of AWS customers are able to automatically and continuously maintain 
their network-based security posture. They can now show compliance with secu- 
rity requirements at a scale that was impossible before. Internally within AWS, 
we are also able to automate some aspects of compliance evidence generation, 
which lowers our costs and increases our ability to quickly launch new features 
and services. 
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Abstract. Verification of fault-tolerant distributed protocols is an 
immensely difficult task. Often, in these protocols, thresholds on set car- 
dinalities are used both in the process code and in its correctness proof, 
e.g., a process can perform an action only if it has received an acknowl- 
edgment from at least half of its peers. Verification of threshold-based 
protocols is extremely challenging as it involves two kinds of reasoning: 
first-order reasoning about the unbounded state of the protocol, together 
with reasoning about sets and cardinalities. In this work, we develop a 
new methodology for decomposing the verification task of such proto- 
cols into two decidable logics: EPR and BAPA. Our key insight is that 
such protocols use thresholds in a restricted way as a means to obtain 
certain properties of “intersection” between sets. We define a language 
for expressing such properties, and present two translations: to EPR and 
BAPA. The EPR translation allows verifying the protocol while assuming 
these properties, and the BAPA translation allows verifying the correct- 
ness of the properties. We further develop an algorithm for automatically 
generating the properties needed for verifying a given protocol, facilitat- 
ing fully automated deductive verification. Using this technique we have 
verified several challenging protocols, including Byzantine one-step con- 
sensus, hybrid reliable broadcast and fast Byzantine Paxos. 


1 Introduction 


Fault-tolerant distributed protocols play an important role in the avionic and 
automotive industries, medical devices, cloud systems, blockchains, etc. Their 
unexpected behavior might put human lives at risk or cause a huge financial 
loss. Therefore, their correctness is of ultimate importance. 

Ensuring correctness of distributed protocols is a notoriously difficult task, 
due to the unbounded number of processes and messages, as well as the non- 
deterministic behavior caused by the presence of faults, concurrency, and mes- 
sage delays. In general, the problem of verifying such protocols is undecidable. 
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This imposes two directions for attacking the problem: (i) developing fully- 
automatic verification techniques for restricted classes of protocols, or (ii) design- 
ing deductive techniques for a wide range of systems that require user assistance. 
Within the latter approach, recently emerging techniques [29] leverage decidable 
logics that are supported by mature automated solvers to significantly reduce 
user effort, and increase verification productivity. Such logics bring several key 
benefits: (i) their solvers usually enjoy stable performance, and (ii) whenever 
annotations provided by the user are incorrect, the automated solvers can pro- 
vide a counterexample for the user to examine. 

Deductive verification based on decidable logic requires a logical formalism 
that satisfies two conflicting criteria: the formalism should be expressive enough 
to capture the protocol, its correctness properties, its inductive invariants, and 
ultimately its verification conditions. At the same time, the formalism should be 
decidable and have an effective automated tool for checking verification conditions. 

In this paper we develop a methodology for deductive verification of 
threshold-based distributed protocols using decidable logic, well-established 
decidable logics to settle the tension explained above. 

In threshold-based protocols, a process may take different actions based on 
the number of processes from which it received certain messages. This is often 
used to achieve fault-tolerance. For example, a process may take a certain step 
once it has received an acknowledgment from a strict majority of its peers, that 
is, from more than n/2 processes, where n is the total number of processes. 
Such expressions as n/2, are called thresholds, and in general they can depend 
on additional parameters, such as the maximal number of crashed processes, or 
the maximal number of Byzantine processes. 

Verification of such protocols requires two flavors of reasoning, as demon- 
strated by the following example. Consider the Paxos [20] protocol, in which 
each process proposes a value and all must agree on a common proposal. The 
protocol tolerates up to t process crashes, and ensures that every two processes 
that decide agree on the decided value. The protocol requires n > 2t processes, 
and each process must obtain confirmation messages from n — t processes before 
making a decision. The protocol is correct due to, among others, the fact that 
if n > 2t then any two sets of n — t processes have a process in common. To 
verify this protocol we need to express (i) relationships between an unbounded 
number of processes and values, which typically requires quantification over unin- 
terpreted domains (“every two processes”), and (ii) properties of sets of certain 
cardinalities (“any two sets of n — t processes intersect”). Crucially, these two 
types of reasoning are intertwined, as the sets of processes for which we need to 
capture cardinalities may be defined by their relations with other state compo- 
nents (“messages from at least n — t processes”). While uninterpreted first-order 
logic (FOL) seems like the natural fit for the first type of reasoning, it is seemingly 
a poor fit for the second type, since it cannot express set cardinalities and the 
arithmetic used to define thresholds. Typically, logics that combine both types 
of reasoning are either undecidable or not flexible enough to capture protocols 
as intricate as the ones we consider. 


Verification of Threshold-Based Distributed Algorithms 247 


The approach we present relies on the observation that threshold-based pro- 
tocols and their correctness proofs use set cardinality thresholds in a restricted 
way as a means to obtain certain properties between sets, and that these prop- 
erties can be expressed in FOL via a suitable encoding. In the example above, 
the important property is that every two sets of cardinality at least n — t have a 
non-empty intersection. This property can be encoded in FOL by modeling sets 
of cardinality at least n—t using an uninterpreted sort along with a membership 
relation between this sort and the sort for processes. However, the validity of 
the property under the assumption that n > 2t cannot be verified in FOL. 

The key idea of this paper is, hence, to decompose the verification problem 
of threshold-based protocols into the following problems: (i) Checking protocol 
correctness assuming certain intersection properties, which can be reduced to 
verification conditions expressed in the Effectively Propositional (EPR) frag- 
ment of FOL [25,35]. (ii) Checking that sets with cardinalities adhering to the 
thresholds satisfy the intersection properties (under the protocol assumptions), 
which can be reduced to validity checks in quantifier-free Boolean Algebra with 
Presburger Arithmetic (BAPA) [19]. Both BAPA and EPR are decidable logics, 
and are supported by mature solvers. 

A crucial step in employing this decomposition is finding suitable intersection 
properties that are strong enough to imply the protocol’s correctness (i.e., imply 
the FOL verification conditions), and are also implied by the precise definitions 
of the thresholds and the protocol’s assumptions. Thus, these intersection prop- 
erties can be viewed as interpolants between the FOL verification conditions 
and the thresholds in the context of the protocol’s assumptions. We present 
fully automated procedures to find such intersection property interpolants, either 
eagerly or lazily. 

The main contributions of this paper aret: 


1. We define a threshold intersection property (TIP) language for expressing 
properties of sets whose cardinalities adhere to certain thresholds; TIP is 
expressive enough to capture the properties required to prove the correctness 
of challenging threshold-based protocols. 

2. We develop two encodings of TIP, one in BAPA, and another in EPR. These 
encodings facilitate decomposition of protocol verification into decidable EPR 
and (quantifier-free) BAPA queries. 

3. We show that there are only finitely many TIP formulas (up to equivalence) 
that are valid for any given protocol. Moreover, we present an effective algo- 
rithm for computing all TIP formulas valid for a given protocol, as well as an 
algorithm for lazily finding a set of TIP formulas that suffice to prove a given 
protocol. 

4. Put together, we obtain an effective deductive verification approach for 
threshold-based protocols: the user models the protocol and its inductive 
invariants in EPR using a suitable encoding of thresholds, and defines the 


1 An extended version of this paper, which includes additional details and proofs, 
appears in [3]. 
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thresholds and the protocol’s assumptions using arithmetic; verification is car- 
ried out automatically via decomposition to well-established decidable logics. 

5. We implement the approach, leveraging mature existing solvers (Z3 and 
CVC4), and evaluate it by verifying several challenging threshold-based pro- 
tocols with sophisticated thresholds and assumptions. Our evaluation shows 
the effectiveness and flexibility of our approach in modeling and verifying 
complex protocols, including the feasibility of automatically inferring thresh- 
old intersection properties. 


2 Preliminaries 


Transition Systems in FOL. We model distributed protocols as transition 
systems expressed in many-sorted FOL. A state of the system is a first-order 
(FO) structure s = (D,Z) over a vocabulary X that consists of sorted constant, 
function and relation symbols, s.t. s satisfies a finite set of axioms O in the form 
of closed formulas over X. D is the domain of s mapping each sort to a set of 
objects (elements), and Z is the interpretation function. A FO transition system 
is a tuple (X, ©O,I, TR), where X and O are as above, J is a closed formula over 
X that defines the initial states, and TR is a closed formula over X W X’ that 
defines the transition relation where X describes the source state of a transition 
and 5” = {a’ | a € X} describes the target state. We require that TR does not 
modify any symbol that appears in ©. The set of reachable states is defined as 
usual. In practice, we define FO transition systems using a modeling language 
with a convenient syntax [29]. 


Properties and Inductive Invariants. A safety property is expressed by a 
closed FO formula P over X. The system is safe if all of its reachable states 
satisfy P. A closed FO formula Inv over X is an inductive invariant for a tran- 
sition system (X, ©, I, TR) and property P if the following formulas, called the 
verification conditions, are valid (equivalently, their negations are unsatisfiable): 
(i) © > (I = Inv), (ii) O — (In^ TR = Inv’) and (iii) O —> (Iw = P), 
where Inv results from substituting every symbol in Inv by its primed version. 
We also use inductive invariants to verify arbitrary first-order LTL formulas via 
the reduction of [30,31]. 


Effectively Propositional Logic (EPR). The effectively-propositional 
(EPR) fragment of FOL is restricted to formulas without function symbols and 
with a quantifier prefix 4*V* in prenex normal form. Satisfiability of EPR for- 
mulas is decidable [25]. Moreover, EPR formulas enjoy the finite model property, 
i.e., y is satisfiable iff it has a finite model. We consider a straightforward exten- 
sion of EPR that maintains these properties and is supported by solvers such as 
Z3 [5]. The extension allows function symbols and quantifier alternations as long 
as the formula’s quantifier alternation graph, denoted QA(y), is acyclic. For y 
in negation normal form, QA(v) is a directed graph where the set of vertices is 
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the set of sorts and the set of edges is defined as follows: every function symbol 
introduces edges from its arguments’ sorts to its image’s sort, and every exis- 
tential quantifier Jx that resides in the scope of universal quantifiers introduces 
edges from the sorts of the universally quantified variables to the sort of x. The 
quantifier alternation graph is extended to sets of formulas as expected. 


Boolean Algebra with Presburger Arithmetic (BAPA). Boolean Algebra 
with Presburger Arithmetic (BAPA) [19] is a FO theory defined over two sorts: 
int (for integers), and set (for subsets of a finite universe). The language is defined 
as follows: 


F:: =B; = Bə | Li = Lə | La < Lo | Fi A Fo | Fi V Fo | nF | 3z.F | Vo.F | 3u.F | Yu.F 
B:: =x |Ø |a| BiU B| BıNBz| B°  Ls=ul|K|n|é|Li+Lo|K-L| |B| 


where L defines linear integer terms, where u denotes an integer variable, k € K 
defines an (interpreted) integer constant symbol ...,—2,—1,0,1,2..., n is an 
integer constant symbol that represents the size of the finite set universe, 7 is an 
uninterpreted integer constant symbol (as opposed to the constant symbols from 
K), and |b| denotes set cardinality; B defines set terms, where x denotes a set 
variable, Ø is a (interpreted) set constant symbol that represents the empty set, 
and a is an uninterpreted set constant symbol; and F defines the set of BAPA 
formulas, where ¢, = ¢2 and ¢; < %2 are atomic arithmetic formulas and bı = b2 
is an atomic set formula. (Other set constraints such as bı C b2 can be encoded 
in the usual way). In the sequel, we also allow arithmetic terms of the form £ 
where k € K is a positive integer and @ € L, as any formula that contains such 
terms can be translated to an equivalent BAPA formula by multiplying by k. 

A BAPA structure is sg = (D,Z) where the domain D maps sort int to the 
set of all integers and maps sort set to the set of all subsets of a finite universe 
U, called the universal set. The semantics of terms and formulas is as expected, 
where the interpretation of the complement operation is defined with respect to 
U (e.g., Z(0°) = U), and the integer constant n is interpreted to the size of U, 
i.e. Z(n) = |U]. 

Both validity and satisfiability of BAPA formulas (with arbitrary quantifi- 
cation) are decidable [19], and the quantifier-free fragment is supported by 
CVC4 [2]. 


3 First-Order Modeling of Threshold-Based Protocols 


Next we explain our modeling of threshold-based protocols as transition systems 
in FOL (Note that FOL cannot directly express set cardinality constraints). The 
idea is to capture each threshold by a designated sort, such that elements of 
this sort represent sets of nodes that satisfy the threshold. Elements of the 
threshold sort are then used instead of the actual threshold in the description of 
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the protocol and in the verification conditions. For verification to succeed, some 
properties of the sets satisfying the cardinality threshold must be captured in 
FOL. This is done by introducing additional assumptions (formally, axioms of 
the transition system) expressed in FOL, as discussed in Sect. 4. 


1 sort node, value, setn—t, Set n43t41, Set n—t+1 
Input: vp ate ntt 
broadcast v, to all processes Sie 
wait until n — t messages have been received assume Jq : setn_t. Ym : node. member(m, q) > 


Ju : value. rev-msg(n, m, u) 


E] 


o ounan un APUN e 


if there exists v s.t. more than ntsi if 3v : value, q : setn+3t41. Vm : node. 
messages contain value v then 6 member(m, q) — rcv-msg(n, m, v) then 
DECIDE(v) 7 decision(n, v) := true 
if there exists exactly one v s.t. more than 8 if Jw : value. Jq : setn—t41. Ym : node. 
n—t : 
Z= messages contain value v then 
2 g 9 member(m, q) — rcv-msg(n,m, v) then 
10 Up =U is 
ll derlying —consensus(vp) = TB 
I CaM. “underlying SUS\Up 11 und_cons(n, vp) := true 


Fig. 1. Bosco: a one-step asynchronous Byzantine consensus algorithm [39], and an 
excerpt RML (relational modeling language) code of the main transition. Note that 
we overload the member relation for all threshold sorts. The formula A!x.y(«) is a 
shorthand for exists and unique. 


Running Example. We illustrate our approach using the example of Bosco— 
an asynchronous Byzantine fault-tolerant (BFT) consensus algorithm [39]. Its 
modeling in first-order logic using our technique appears alongside an informal 
pseudo-code in Fig. 1. 

In the BFT consensus problem, each node proposes a value and correct nodes 
must decide on a unique proposal. BFT consensus algorithms typically require 
at least two communication rounds to reach a decision. In Bosco, nodes execute 
a preliminary communication step which, under favorable conditions, reaches an 
early decision, and then call an underlying BFT consensus algorithm to ensure 
reaching a decision even if these conditions are not met. Bosco is safe when 
n > 3t; it guarantees that a preliminary decision will be reached if all nodes are 
non-faulty and propose the same value when n > 5t (weakly one-step condition), 
and even if some nodes are faulty, as long as all non-faulty nodes propose the 
same value, when n > 7t (strongly one-step condition). 

Bosco achieves consensus by ensuring that (a) no two correct nodes decide 
differently in the preliminary step, and (b) if a correct node decides value v 
in the preliminary step then every correct process calls the underlying BFT 
consensus algorithm with proposal v. Property (a) is ensured by the fact that 
a node decides in the preliminary step only if more than nist nodes proposed 
the same value. When n > 3t, two sets of cardinality greater than n3t have 
at least one non-faulty node in common, and therefore no two different values 
can be proposed by more than att nodes. Similarly, we can derive property 
(b) from the fact that a set of more than $% nodes and a set of n — t nodes 
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intersect in nit nodes, which, after removing t nodes which may be faulty, still 


leaves us with more than nt nodes, satisfying the condition in line 9. 


3.1 Threshold-Based Protocols 


Parameters and Resilience Conditions. We consider protocols whose def- 
initions depend on a set of parameters, Prm, divided into integer parameters, 
Prmyz, and set parameters, Prms. Prmy always includes n, the total number 
of nodes (assumed to be finite). Protocol correctness is ensured under a set of 
assumptions I called resilience conditions, formulated as BAPA formulas over 
Prm (this means that all the uninterpreted constants appearing in I are from 
Prm). In Bosco, Prm; = {n, t}, where t is the maximal number of Byzantine fail- 
ures tolerated by the algorithm, and Prmg = {f}, where f is the set of Byzantine 
nodes; I’ = {n > 3t + 1,|f| < t}. 


Threshold Conditions. Both the description of the protocol and the inductive 
invariant may include conditions that require the size of some set of nodes to be 
“at least t”, “at most t”, and so on, where the threshold t is of the form t = £, 
where k is a positive integer, and £ is a ground BAPA integer term over Prm (we 
do not allow comparing sizes of two sets — we observe that it is not needed for 
threshold-based protocols). We denote the set of thresholds by T. For example, 
in Bosco, T = {n — t, PHH nth), 

Wlog we assume that all conditions on set cardinalities are of the form “at 
least t” since every condition can be written this way, possibly by introducing 
new thresholds: 


£ £+1 


k-n—£ £ £-1 
xl>t=xl> = ixis 


=|xX‘°|> X SSNS 
x22 Bol <s 


3.2 Modeling in FOL 


FO Vocabulary for Modeling Threshold-Based Protocols. We describe 
the protocol’s states (e.g., pending messages, votes, etc.) using a core FO vocab- 
ulary Xc that includes sort node and additional sorts and symbols. Parameters 
Prm are not part of the FO vocabulary used to model the protocol. Also, we do 
not model set cardinality directly. Instead, we encode the cardinality thresholds 
in FOL by defining a FO vocabulary S£"™: 


— For every threshold t we introduce a threshold sort set, with the intended 
meaning that elements of this sort are sets of nodes whose size is at least t. 

— Each sort set, is equipped with a binary relation symbol member; between 
sorts node and set; that captures the membership relation of a node in a set. 

— For each set parameter a € Prmg we introduce a unary relation symbol 
memberg over sort node that captures membership of a node in the set a. 
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We then model the protocol as a transition system (2’,0,J, TR) where X = 
Xo w Ser 

We are interested only in states (FO structures over X) where the inter- 
pretation of the threshold sorts and membership relations is according to their 
intended meaning in a corresponding BAPA structure. Formally, these are T- 
extensions, defined as follows: 


Definition 1. We say that a FO structure sc = (Dc, To) over Xc and a BAPA 
structure sg = (Dg,Tpg) over Prm are compatible if Dg(set) = P(Do(node)), 
where P is the powerset operator. For such compatible structures and a set of 
thresholds T over Prm, the T-extension of sc by sg is the structure s = (D, T) 
over X defined as follows: 

D(s) = De (s) for every sort s in Xo T(a) = Tc(a) for every a in Xo 

D(sett) = {A C Do(node) | |A| > Te (t)} T(membera) = Tg (a) 

TI(member:) = { (e, A) | e € Do (node), A € D(set+),e € A} 


Note that for the T-extension s to be well defined as a FO structure, we must 
have that D(set+) 4 Ø for every threshold t € T. This means that a T-extension 
by sg only exists if {A C D(node) | |A| > Zg(t)} # 0. This is ensured by the 
following condition: 


Definition 2 (Feasibility). T is I-feasible if I |t < n for everyt ET. 


Expressing Threshold Constraints. Cardinality constraints can be 
expressed in FOL over the vocabulary X = Xc W NE"™ using quantification. To 
express that |{n : node | y(n, u)}| > t, i.e., that there are at least t nodes that 
satisfy the FO formula y over Xç (where ū are free variables in y), we use the fol- 
lowing first-order formula over X: dq : sety. Vn : node. member; (n, q) > y(n, ù). 
Similarly, to express the property that a node is a member of a set parameter a 
(e.g., to check if n € f, i.e., a node is faulty) we use the FO formula membera(n). 
For example, in Fig. 1, line 5 (right) uses the FO modeling to express the condi- 
tion in line 5 (left). This modeling is sound in the following sense: 


Lemma 1 (Soundness). Let sc = (Dc,ZIc) be a FO structure over Xo, 
sg = (Dg,Tpg) a compatible BAPA structure over Prm s.t. sg =T and T a 
T -feasible set of thresholds over Prm. Then there exists a (unique) T-extension 
s of sc by sg. Further: 


1. For every a E€ Prmg and FO valuation ı: s,ı = membera(n) iff i(n) € Tg(a), 

2. For every t € T, formula y, and FO valuation vu: s, = Aq : seti. Vn: 
node. member,(n,q) > p(n, uŭ) iff |{e € D(node) | sc, ijn > e] H y(n, u)} > 
Tp(t). 


Definition 3. A first-order structure s over X is threshold-faithful if it is a 
T-extension of some sc by some sg =T (as in Lemma 1). 
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Incompleteness. Lemma 1 ensures that the FO modeling can be soundly used 
to verify the protocol. It also ensures that the modeling is precise on threshold- 
faithful structures (Def. 1). Yet, the FO transition system is not restricted to 
such states, hence it abstracts the actual protocol. To have any hope to verify 
the protocol, we must capture some of the intended meaning of the threshold 
sorts and relations. This is obtained by adding FO axioms to the FO transition 
system. Soundness is maintained as long as the axioms hold in all threshold- 
faithful structures. We note that the set of all such axioms is not recursively 
enumerable- this is where the essential incompleteness of our approach lies. 


4 Decomposition via Threshold Intersection Properties 


In this section, we identify a set of properties we call threshold intersection 
properties. When captured via FO axioms, these properties suffice for verifying 
many threshold-based protocols (all the ones we considered). Importantly, these 
are properties of sets adhering to the thresholds that do not involve the protocol 
state. As a result, they can be expressed both in FOL and in BAPA. This allows 
us to decompose the verification task into: (i) checking that certain threshold 
properties are valid in all threshold-faithful structures by checking that they are 
implied by I (carried out using quantifier free BAPA), and (ii) checking that 
the verification conditions of the FO transition-system with the same threshold 
properties taken as axioms are valid (carried out in first-order logic, and in EPR 
if quantifier alternations are acyclic). 


4.1 Threshold Intersection Property Language 


Threshold properties are expressed in the threshold intersection property lan- 
guage (TIP). TIP is essentially a subset of BAPA, specialized to have the prop- 
erties listed above. 


Syntax. We define TIP as follows, with t € T a threshold (of the form £) and 
a € Prmg: 


F:: = B # | B° = Q | g>i(B) | Fi A Fo | Vz : g>. F 
B: =a | a° |x| x° |0| 0° | Bin B2 


TIP restricts the use of set cardinality to threshold guards g>+(b) with the 
meaning |b| > t. No other arithmetic atomic formulas are allowed. Comparison 
atomic formulas are restricted to b Æ § and b° = Ø. Quantifiers must be guarded, 
and negation, disjunction and existential quantification are excluded. We forbid 
set union and restrict complementation to atomic set terms. We refer to such 
formulas as intersection properties since they express properties of intersections 
of (atomic) sets. 
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Example 1. In Bosco, the following property captures the fact that the intersec- 
tion of a set of at least n — t nodes and a set of more than weit nodes consists 
n-t 


of at least = non-faulty nodes. This is needed for establishing correctness of 


the protocol. 


Va: J>n-t- VY : J> 2+3t+1. J> n-th (anyNf°) 


Semantics. As TIP is essentially a subset of BAPA, we define its semantics by 
translating its formulas to BAPA, where most constructs directly correspond to 
BAPA constructs, and guards are translated to cardinality constraints: 


= k- jb >£ B(Yx : g. p) W Yr, =B(g(x)) V B(y) 


The notions of structures, satisfaction, equivalence, validity, satisfiability, etc. 
are inherited from BAPA. In particular, given a set of BAPA resilience conditions 
I over the parameters Prm, we say that a TIP formula ¢ is I’-valid, denoted 
r E y, if r H Bly). 

If I is quantifier-free (which is the typical case), T-validity of TIP for- 
mulas can be checked via validity checks of quantifier-free BAPA formulas, 
supported by mature solvers. Note that -validity of a formula of the form 
Ve: g>t,. |e Mb| > te is equivalent to l H} Vu. u > t > ut |b| —n > te, 
allowing replacing quantification over sets by quantification over integers, thus 
improving performance of existing solvers. 


4.2 Translation to FOL 


To verify threshold-based protocols, we translate TIP formulas to FO axioms, using 
the threshold sorts and relations. To translate g>+(b), we follow the principle in 


(Sect. 3.2): 
FOY) = ~FO(¢) FO(n € b°) = ~FO(n € b) 
FO(p1 \ 92) = FO(¢1) A FO(y2) FO(n € Ø) = false 
FO(Va:9.y) =Va«:setg.FO(y) FO(n € a) = membera(n) 
FO(n € bı N b2) = FO(n E b1) A FO(n € b2) FO(n € x) = member;(n, x) 
FO(b #0) = An: node. FO(n € b) where x is guarded by t 
FO(b° = 0) = Vn: node. FO(n € b) 
FO(g>1(b)) = dx : sett. Yn : node. member; (n, x) + FO(n € b) 


We lift FO to sets of formulas: FO(A) = {FO(y) | p € A}. 

Next, we state the soundness of the translation, which intuitively means that 
FO() is “equivalent” to y over threshold-faithful FO structures (Definition 1). 
This justifies adding FO(y) as a FO axiom whenever y is I’-valid. 


Theorem 1 (Translation soundness). Let sc = (Dc,Zc) be a first-order 
structure over Xc, sg = (Dg, Tp) a compatible BAPA structure over Prm, and 
s the T-extension of sc by sg. Then for every closed TIP formula p, we have 
sB = y & s H FO). 
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Corollary 1. For every closed TIP formula y such that I = ọ, we have that 
FO() is satisfied by every threshold-faithful first-order structure. 


5 Automatically Inferring Threshold Intersection 
Properties 


To apply the approach described in Sects. 3 and 4, it is crucial to find suitable 
threshold properties. That is, given the resilience conditions I’ and a FO tran- 
sition system modeling the protocol, we need to find a set A of TIP formulas 
such that (i)  — ọ for every ọ € A, and (ii) the VCs of the transition system 
with the axioms FO(A) are valid. 

In this section, we address the problem of automatically inferring such a 
set A. In particular, we prove that for any protocol that satisfies a natural 
condition, there are finitely many -valid TIP formulas (up to equivalence), 
enabling a complete automatic inference algorithm. Furthermore, we show that 
(under certain reasonable conditions formalized in this section), the FO axioms 
resulting from the inferred TIP properties have an acyclic quantifier alternation 
graph, facilitating protocol verification in EPR. 


Notation. For the rest of this section, we fix a set Prm of parameters, a set I" 
of resilience conditions over Prm, and a set T of thresholds. Note that b 40 = 
g>i(b) and b° = Ø = g>y(b). Therefore, for uniformity of the presentation, given 


a set T of thresholds, we define T fru {1,n} and replace atomic formulas of 
the form b 4 Ú and b° = @ by the corresponding guard formulas. As such, the 
only atomic formulas are of the form g>+(b) where t € T. Note that guards in 
quantifiers are still restricted to g>; where t € T. Given a set Prmg, we also 
denote Prmg = Prmg U {a° | a € Prmg}. 


5.1 Finding Consequences in the Threshold Intersection Property 
Language 


In this section, we present AIP- an algorithm for inferring all T -valid TIP formu- 
las. A naive (non-terminating) algorithm would iteratively check I’-validity of 
every TIP formula. Instead, AIP prunes the search space relying on the following 
condition: 


Definition 4. T is I'-non-degenerate if for every t € T it holds that At <0. 


If I | t <0 then t is degenerate in the sense that g>+(b) is always T-valid, and 
Va: g>t- Jv (x N b) is never T-valid unless t’ is also degenerate. 

We observe that we can (i) push conjunctions outside of formulas (since V dis- 
tributes over ^), and assuming non-degeneracy, (ii) ignore terms of the form x°: 
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Lemma 2. [fT is I’-feasible and I’-non-degenerate, then for every I’-valid p 
in TIP, there exist y1,..., pm $-t. p = AjL, Yi and for every 1 <i < m, yi is 
of the form: 


Va1: G>t,.--V&@q: G>t,- g>il21 N... Nz Na... Nag) 


where q+ k > 0, ti,... tqa ET, t€ T, a,...,@% € Prmg, and the a;’s are 
distinct. 


We refer to y; of the form above as simple, and refer to g>+ as its atomic guard. 
By Lemma 2, it suffices to generate all simple I’-valid formulas. Next, we show 

that this can be done more efficiently by pruning the search space based on a 

subsumption relation that is checked syntactically avoiding I’-validity checks. 


Definition 5 (Subsumption). For every hı, ha € TUPrms, we denote hı Er 
h2 if one of the following holds: (1) hı = h2, or (2) hi, he E€ T and I E= hy È ho, 
or (3) hı € Prmg, ha ET and I E |hy| > he. 


For hi, ho € T and hz € Prmg, hy Cp hy means that I H Yz : I>hi- I>ha (T), 
and h3 Er h means that I’ = g>n, (h3). We lift the relation Ep to act on simple 
formulas: 


Definition 6. Given simple formulas 


a=Vr1 : 9>hy pik Vig? Ota Gel By N... N £a N Aggy -NO hg) 
B =Y21 | Jòn -Vy : >h: garlt N... N zy N hypi- NO hp) 


we say that a Er B if (i) t Er t, and (ii) there exists an injective function 
f: {1,...,k'} —> {1,...,k} s.t. for any 1<i< Kk’ it holds that h, Er hja). 


Lemma 3. Leta, p be simple formulas such thata Ep p. IfI =a then l} 8. 


Corollary 2. If no simple formula with q quantifiers is I’-valid then no simple 
formula with more than q quantifiers is I’-valid. 


Algorithm 1 depicts AIP that generates all I’-valid simple formulas, relying on 
Lemma 3. AIP uses a naive search strategy; different strategies can be used 
(e.g. [26]). Based on Corollary 2, AIP terminates if for some number of quan- 
tifiers no I -valid formula is discovered. 

Algorithm 1. Algorithm for Inferring Intersection Properties (AIP) 

Input: Prms, T, I 

1 set checked-true = checked_false = [ | ; 

2 foreach q = 0,1,... do 

3 foreach simple formula p over T and Prmg with q quantifiers do 

4 if exists p € checked_true s.t. Y Er y then yield ọ ; 

5 else if exists Y € checked_false s.t. p Er y then continue ; 
6 else if I'= y then yield ọ ; add to checked_true ; 
7 
8 


else add ọ to checked _false ; 
if no formulas were added to checked_true then terminate ; 


Verification of Threshold-Based Distributed Algorithms 257 


Lemma 4 (Soundness). Every formula ọ that is returned by the algorithm is 
I’-valid. 


Lemma 5 (Completeness). If T is ['-feasible and I'-non-degenerate, then 
for every T -valid TIP formula ọ there exist p1... Pm $-t. p= Nic Yi and AIP 
yields every pi. 


Next, we characterize the cases in which there are finitely many I-valid TIP 
formulas, up to equivalence, and thus, AIP is guaranteed to terminate. 


Definition 7. T is I’-sane if for every tı,t2 € T, I E ti < OVig >n-1. 
(T, Prmg) is I’-sane if, in addition, for every tı € T, a € Prms, I E ti < 
OV jal =n. 


Theorem 2. Assume that T is I’-feasible. Then the following conditions are 
equivalent: (1) There are finitely many T -valid simple formulas. (2) There are 
finitely many I’-valid TIP formulas, up to equivalence. (3) T is T-sane. 


Corollary 3 (Termination). If T is I’-feasible and I'-sane, AIP terminates. 


5.2 From TIP to Axioms in EPR 


The set of simple formulas generated by AIP, A, is translated to FOL axioms 
as described in Sect. 4.2. Next, we show how to ensure that the quantifier alter- 
nation graph (Sect.2) of FO(A) is acyclic. A simple formula induces quantifier 
alternation edges in QA(FO(vy)) from the sorts of its universal quantifiers to 
the sort of its atomic guard g>: (or if t = 1 to the node sort). Therefore, from 
Lemma 3, for a T-valid y, cycles in QA(FO(w)) may only occur if they occur in 
the graph obtained by Er. Furthermore, if QA(FO(y)) is not acyclic, then the 
atomic guard must be equal to one of the quantifier guards. We refer to such a 
formula as a cyclic formula. We show that, under the following assumption, we 
can eliminate all cyclic formulas from A. 


Definition 8. T is I’-acyclic if for every ty, tg E T, if I E ti = t2 then ty = t2. 


Intuitively, if T is not I’-acyclic, then it has (at least) two “equivalent” thresh- 
olds, making one of them redundant. If that is the case, we can alter the protocol 
and its proof so that one of these guards is eliminated and the other one is used 
instead. 


Theorem 3. Let T be I'-feasible and I'-acyclic and (T, Prmg) be I’-sane. Let 
A be the set returned by AIP, and A’ = {py E A | y is acyclic}. Then the VCs 
of the FO transition system with axioms FO(A) are valid iff they are valid with 
atioms FO(A’). Further, QA(FO(A’)) is acyclic. 


5.3 Finding Minimal Properties Required for a Protocol 


If A consists of all acyclic T-valid TIP formulas returned by AIP, using FO(A) 
as FO axioms leads to divergence of the verifier. To overcome this, we propose 
two variants. 
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Minimal Equivalent. Amin. Some of the formulas in FO(A) are implied by 
others, making them redundant. We remove such formulas using a greedy pro- 
cedure that for every y; € A, checks whether FO(A \ {y;}) H FO(y;), and if 
so, removes y; from A. Note that if QA(FO(A)) is acyclic, the check translates 
to (un)satisfiability in EPR. 

This procedure results in Amin C A s.t. FO(Amin) =E FO(A) and no strict 
subset of Amin satisfies this condition. That is, Amin is a local minimum for 
that property. 


Interpolant. Aint. There may exist Aint C A s.t. FO(Aint) K FO(A) but 
FO(Ainz) suffices to prove the first-order VCs, and enables to discharge the 
VCs more efficiently. We compute such a set A;n: iteratively. Initially, Aint = 0. 
In each iteration, we check the VCs. If a counterexample to induction (CTT) 
is found, we add to Aint a formula from A not satisfied by the CTI. In this 
approach, A is not pre-computed. Instead, AIP is invoked lazily to generate 
candidate formulas in reaction to CTIs. 


6 Evaluation 


We evaluate the approach by verifying several challenging threshold-based dis- 
tributed protocols that use sophisticated thresholds: we verify the safety of 
Bosco [39] (presented in Sect. 3) under its 3 different resilience conditions, the 
safety and liveness (using the liveness to safety reduction presented in [30]) of 
Hybrid Reliable Broadcast [40], and the safety of Byzantine Fast Paxos [23]. 
Hybrid Reliable Broadcast tolerates four different types of faults, while Fast 
Byzantine Paxos is a fast-learning [21,22] Byzantine fault-tolerant consensus 
protocol; fast-learning protocols are notorious because two such algorithms, 
Zyzzyva [17] and FaB [28], were recently revealed incorrect [1] despite having 
been published at major systems conferences. 


Implementation. We implemented both algorithms described in Sect. 5.3. 
AlPpacrer eagerly constructs A by running AIP, and then uses EPR reasoning 
to remove redundant formulas (whose FO representation is implied by the FO 
representation of others). To reduce the number of EPR validity checks used 
during this minimization step, we implemented an optimization that allows us 
to prove redundancy of TIP formulas internally based on an extension of the 
notion of subsumption from Sect.5. AIPLazy computes a subset of A while using 
AT in a lazy fashion, guided by CTIs obtained from attempting to verify the FO 
transition system. Our implementations use CVC4 to discharge BAPA queries, 
and Z3 to discharge EPR queries. Verification of first-order transition systems is 
performed using Ivy, which internally uses Z3 as well. All experiments reported 
were performed on a laptop running 64-bit Windows 10, with a Core-i5 2.2 GHz 
CPU, using Z3 version 4.8.4, CVC4 version 1.7, and the latest version of Ivy. 
Figure 2 lists the protocols we verified and the details of the evaluation. Each 
experiment was repeated 10 times, and we report the mean time (u) and standard 
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deviation (ø). The figure’s caption explains the presented information, and we 
discuss the results below. 


Aipgacer For all protocols, running AIP took less than 1 min (column tc), and 
generated all I’-valid simple TIP formulas. We observe that for most formu- 
las, (in)validity is deduced from other formulas by subsumption, and less than 
2%-5% of the formulas are actually checked using a BAPA query. With the 
optimization of the redundancy check, minimization of the set is performed in 
negligible time. The resulting set, Agacrr, contains 3-5 formulas, compared to 
39-79 before minimization. 

Due to the optimization described in Sect. 4 for the BAPA validity queries, 
the number of quantifiers in the TIP formulas that are checked by AIP does 
not affect the time needed to compute the full A. For example, Bosco under 
the Strongly One-step resilience condition contains J’-valid simple TIP formulas 
with up to 7 quantifiers (as n > 7t and tı = n — t), but AIP does not take 
significantly longer to find A. Interestingly, in this example the I’-valid TIP 
formulas with more than 3 quantifiers are implied (in FOL) by formulas with at 


most 3 quantifiers, as indicated by the fact that these are the only formulas that 


ss Bosco Strongly One-step 
remain in Ancor : 


Aipzazy With the lazy approach based on CTIs, the time for finding the set 
of TIP formulas, Apazy, is generally longer. This is because the run time is 
dominated by calls to Ivy with FO axioms that are too weak for verifying the 
protocol. However, the resulting Azazy has a significant benefit: it lets Ivy prove 
the protocol much faster compared to using Apacer. Comparing ty in AIPRacrr 
vs. AIPpazy shows that when the former takes a minute, the latter takes a few 
seconds, and when the former times out after 1h, the latter terminates, usually 
in under 1 min. Comparing the formulas of Afacrr and Azazy reveals the reason. 
While the FO translation of both yields EPR formulas, the formulas resulting 
from Apacer Contain more quantifiers and generate much more ground terms, 
which degrades the performance of Z3. 

Another advantage of the lazy approach is that during the search, it avoids 
considering formulas with many quantifiers unless those are actually needed. 
Comparing the 3 versions of Bosco we see that AIPpazy is not sensitive to the 
largest number of quantifiers that may appear in a I -valid simple TIP formula. 
The downside is that AIP,,,4zy performs many Ivy checks in order to compute the 
final Ayazy. The total duration of finding CTIs varies significantly (as demon- 
strated under the column ty), in part because it is very sensitive to the CTIs 
returned by Ivy, which are in turn affected by the random seed used in the 
heuristics of the underlying solver. 

Finally, Az azy provides more insight into the protocol design, since it presents 
minimal assumptions that are required for protocol correctness. Thus, it may be 
useful in designing and understanding protocols. 
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7 Related Work 


Fully Automatic Verification of Threshold-Based Protocols. Algorithms 
modeled as Threshold automata (TA) [14] have been studied in [13,16], and ver- 
ified using an automated tool ByMC [15]. The tool also automatically synthe- 
sizes thresholds as arithmetic expressions [24]. Reachability properties of TAs 
for more general thresholds are studied in [18]. There have been recent advances 
in verification of synchronous threshold-based algorithms using TAs [41], and 
of asynchronous randomized algorithms where TAs support coin tosses and an 
unbounded number of rounds [4]. Still, this modeling is very restrictive and not 
as faithful to the pseudo-code as our modeling. 

Another approach for full automation is to use sound and incomplete proce- 
dures for deduction and invariant search for logics that combine quantifiers and 
set cardinalities [8,10]. However, distributed systems of the level of complexity 
we consider here (e.g., Byzantine Fast Paxos) are beyond the reach of these 
techniques. 


Verification of Distributed Protocols Using Decidable Logics. Padon 
et al. [33] introduced an interactive approach for the safety verification of dis- 
tributed protocols based on EPR using the Ivy [29] verification tool. Later 
works extended the approach to more complex protocols [32], their implementa- 
tions [42], and liveness properties [30,31]. Those works verified some threshold 
protocols using ad-hoc first-order modeling and axiomatization of threshold- 
intersection properties, whereas we develop a systematic methodology. Moreover, 
the axioms were not mechanically verified, except in [42], where a simple inter- 
section property—intersection of two sets with more than 5 nodes—requires a 
proof by induction over n. The proof relies on a user provided induction hypoth- 
esis that is automatically checked using the FAU decidable fragment [9]. This 
approach requires user ingenuity even for a simple intersection property, and we 
expect that it would not scale to the more complex properties required for e.g. 
Bosco or Fast Byzantine Paxos. In contrast, our approach completely automates 
both verification and inference of threshold-intersection properties required to 
verify protocol correctness. 

Dragoi et al. [6] propose a decidable logic supporting cardinalities, uninter- 
preted functions, and universal quantifiers for verifying consensus algorithms 
expressed in the partially synchronous Heard-Of Model. As in this paper, the 
user is expected to provide an inductive invariant. The PSync framework [7] 
extends the approach to protocol implementations. Compared to our approach, 
the approach of Dragoi et al. is less flexible due to the specialized logic used and 
the restrictions of the Heard-Of Model. 

Our approach decomposes verification into EPR and BAPA. Piskac [34] 
presents a decidable logic that combines BAPA and EPR, with some restric- 
tions. The verification conditions of the protocols we consider are outside the 
scope of this fragment since they include cardinality constraints in the scope of 
quantifiers. Furthermore, this logic is not supported by mature solvers. Instead 
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of looking for a specialized logic per protocol, we rely on a decomposition which 
allows more flexibility. 

Recently, [11] presented an approach for verifying asynchronous algorithms 
by reduction to synchronous verification. This technique is largely orthogonal 
and complementary to our approach, which is focused on the challenge of cardi- 
nality thresholds. 


Verification using interactive theorem provers. We are not aware of works based 
on interactive theorem provers that verified protocols with complex thresholds 
as we do in this work (although doing so is of course possible). However, many 
works used interactive theorem provers to verify related protocols, e.g., [12, 27, 
36-38, 43] (the most related protocols use either % or 2n as the only thresholds, 
other protocols do not involve any thresholds). The downside of verification 
using interactive theorem provers is that it requires tremendous human efforts 
and skills. For example, the Verdi proof of Raft included 50,000 lines of proof 
in Coq for 500 lines of code [44]. 


8 Conclusion 


This paper proposes a new deductive verification approach for threshold-based 
distributed protocols by decomposing the verification problem into two well- 
established decidable logics, BAPA and EPR, thus allowing greater flexibility 
compared to monolithic approaches based on domain-specific, specialized logics. 
The user models their protocol in EPR, defines the thresholds and resilience 
conditions using arithmetic in BAPA, and provides an inductive invariant. An 
automatic procedure infers threshold intersection properties expressed in TIP 
that are both (1) sound w.r.t. the resilience conditions (checked in quantifier- 
free BAPA) and (2) sufficient to discharge the VCs (checked in EPR). Both 
logics are supported by mature solvers, and allow providing the user with an 
understandable counterexample in case verification fails. 

Our evaluation, which includes notoriously tricky fast-learning consensus pro- 
tocols, shows that threshold intersection properties are inferred in a matter of 
minutes. While this may be too slow for interactive use, we expect improvements 
such as memoization and parallelism to provide response times of a few seconds 
in an iterative, interactive setting. Another potential future direction is combin- 
ing our inference algorithm with automated invariant inference algorithms. 
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Abstract. We address the problem of checking that computations of a 
shared memory implementation (with write and read operations) adheres 
to some given consistency model. It is known that checking conformance 
to Sequential Consistency (SC) for a given computation is NP-hard, and 
the same holds for checking Total Store Order (TSO) conformance. This 
poses a serious issue for the design of scalable verification or testing 
techniques for these important memory models. In this paper, we tackle 
this issue by providing an approach that avoids hitting systematically 
the worst-case complexity. The idea is to consider, as an intermediary 
step, the problem of checking weaker criteria that are as strong as pos- 
sible while they are still checkable in polynomial time (in the size of 
the computation). The criteria we consider are new variations of causal 
consistency suitably defined for our purpose. The advantage of our app- 
roach is that in many cases (1) it can catch violations of SC/TSO early 
using these weaker criteria that are efficiently checkable, and (2) when 
a computation is causally consistent (according to our newly defined 
criteria), the work done for establishing this fact simplifies significantly 
the work required for checking SC/TSO conformance. We have imple- 
mented our algorithms and carried out several experiments on realistic 
cache-coherence protocols showing the efficiency of our approach. 


1 Introduction 


This paper addresses the problem of checking whether a given implementation of 
a shared memory offers the expected consistency guarantees to its clients which 
are concurrent programs composed of several threads running in parallel. Indeed, 
users of a memory need to see it as an abstract object allowing to perform con- 
current reads and writes over a set of variables, which conform to some memory 
model defining the valid visible sequences of such operations. Various memory 
models can be considered in this context. Sequential Consistency (SC) [24] is 
the model where operations can be seen as atomic, executing according to some 
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interleaving of the operations issued by the different threads, while preserving 
the order in which these operations were issued by each of the threads. This 
fundamental model offers strong consistency in the sense that for each write 
operation, when it is issued by a thread, it is immediately visible to all the other 
threads. Other weaker memory models are adopted in order to meet performance 
and/or availability requirements in concurrent/distributed systems. One of the 
most widely used models in this context is Total Store Order (TSO) [29]. In this 
model, writes can be delayed, which means that after a write is issued, it is not 
immediately visible to all threads (except for the thread that issued it), and it is 
committed later after some arbitrary delay. However, writes issued by the same 
thread are committed in the same order are they were issued, and when a write 
is committed it becomes visible to all the other threads simultaneously. TSO is 
implemented in hardware but also in a distributed context over a network [22]. 

Implementing shared memories that are both highly performant and correct 
with respect to a given memory model is an extremely hard and error prone 
task. Therefore, checking that a given implementation is indeed correct from 
this point of view is of paramount importance. In this paper we address the 
issue of checking that a given execution of a shared memory implementation is 
consistent, and we consider as consistency criteria the cases of SC and TSO. 

Checking SC or TSO conformance is known to be NP-complete [18,21]. This 
is due to the fact that in order to justify that the execution is consistent, one 
has to find a total order between the writes which explains the read operations 
happening along the computation. It can be proved that one cannot avoid enu- 
merating all the possible total orders between writes, in the worst case. The 
situation is different for other weaker criteria such as Causal Consistency (CC) 
and its different variations, which have been shown to be checkable in polyno- 
mial time (in the the size of the computation) [6]. In fact, CC imposes fewer 
constraints than SC/TSO on the order between writes, and the way it imposes 
these constraints is “deterministic”, in the sense that they can be derived from 
the history of the execution by applying a least fixpoint computation (which 
can be encoded for instance, as a standard DATALOG program). All these com- 
plexity results hold under the assumption that each value is written at most 
once, which is without loss of generality for implementations which are data- 
independent [31], i.e., their behavior doesn’t depend on the concrete values read 
or written in the program. Indeed, any buggy behavior of such implementations 
can be exposed in executions satisfying this assumption £. 

The intrinsic hardness of the problem of checking SC/TSO poses a crucial 
issue for the design of scalable verification or testing techniques for these impor- 
tant consistency models. Tackling this issue requires the development of practical 
approaches that can work well (with polynomial complexity) when the instance of 
the problem does not need to generate the worst case (exponential) complexity. 


1 All the CC variations become NP-complete without the assumption that each value 
is written at most once [6]. This holds for the variations of CC we introduce in this 
paper as well. 
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The purpose of this paper is to propose such an approach. The idea is to 
reduce the amount of “nondeterminism” in searching for the write orders in order 
to establish SC/TSO conformance. For that, our approach for SC is to consider 
a weaker consistency model called CCM (for Convergent Causal Memory), that 
is “as strong as possible” while being polynomial time checkable. In fact CCM 
is stronger than both causal memory [2,26] (CM) and causal convergence [7] 
(CCv), two other well-known variations of causal consistency. Then, if CCM 
is already violated by the given computation then we can conclude that the 
computation does not satisfy the stronger criterion SC. Here the hope is that 
in practice many computations violating SC can be caught already at this stage 
using a polynomial time check. Now, in the case that the computation does not 
violate CCM, we exploit the fact that establishing CCM already imposes a set of 
constraints on the order between writes. We show that these constraints form a 
partial order which must be a subset of any total write order that would witness 
for SC conformance. Therefore, at this point, it is enough to find an extension 
of this partial write order, and the hope is that in many practical cases, this set 
of constraints is already large enough, letting only a small number of pairs of 
writes to be ordered in order to check SC conformance. For the case of TSO, we 
proceed in the same way, but we consider a different intermediary polynomial 
time checkable criterion called weak CCM (wCCM). This is due to the fact 
that some causality constraints need to be relaxed in order to take into account 
the program order relaxations of TSO, that allow reads to overtake writes. The 
definitions of the new criteria CCM and wCCM we use in our approach are quite 
subtile. Ensuring that these criteria are “as strong as possible” by including all 
possible order constraints on pairs of writes that can be computed (in polynomial 
time) using a least fixpoint calculation, while still ensuring that they are weaker 
than SC/TSO, and proving this fact, is not trivial. 

As a proof of concept, we implemented our approach for checking SC/TSO 
and applied it to executions extracted from realistic cache coherence protocols 
within the Gem5 simulator [5] in system emulation mode. This evaluation shows 
that our approach scales better than a direct encoding of the axioms defining 
SC and TSO [3] into boolean satisfiability. We also show that the partial order 
of writes imposed by the stronger criteria CCM and wCCM leaves only a small 
percentage of writes unordered (6.6% in average) in the case that the executions 
are valid, and most SC/TSO violations are also CCM/wCCM violations. 


2 Sequential Consistency and TSO 


We consider multi-threaded programs over a set of shared variables Var = 
{x,y,...}. Threads issue read and write operations. Assuming an unspecified 
set of values Val and a set of operation identifiers Old, we let 


Op = {read; (a, v), write;(x, v) : i € Old, x € Var, v € Val} 


be the set of operations reading a value v or writing a value v to a variable 
x. We omit operation identifiers when they are not important. The set of read, 


270 R. Zennou et al. 


resp., write, operations is denoted by R, resp., W. The set of read, resp., write, 
operations in a set of operations O is denoted by R(O), resp., W(O). The variable 
accessed by an operation o is denoted by var(o). 

Consistency criteria like SC or TSO are formalized on an abstract view of 
an execution called history. A history includes a set of write or read opera- 
tions ordered according to a (partial) program order po which order operations 
issued by the same thread. Most often, po is a union of sequences, each sequence 
containing all the operations issued by some thread. Then, we assume that 
the history includes a write-read relation which identifies the write operation 
writing the value returned by each read in the execution. Such a relation can 
be extracted easily from executions where each value is written at most once. 
Since shared-memory implementations (or cache coherence protocols) are data- 
independent [31] in practice, i.e., their behavior doesn’t depend on the concrete 
values read or written in the program, any potential buggy behavior can be 
exposed in such executions. 


Definition 1. A history (O,po,wr) is a set of operations O along with a strict 
partial program order po and a write-read relation wr C W(O) x R(O), such 
that the inverse of wr is a total function and if (write(x, v), read(x’,v’)) € wr, 
then z = x' and v =v. 


We assume that every history includes a write operation writing the initial 
value of variable x, for each variable x. These write operations precede all other 
operations in po. We use h, hj, h2, ... to range over histories. 

We now define the SC and TSO memory models (we use the same definitions 
as in the formal framework developed by Alglave et al. [3]). Given a history 
h = (O, po, wr) and a variable x, a store order on x is a strict total order ww, on 
the write operations write_(a,_) in O. A store order is a union of store orders wwz, 
one for each variable x used in h. A history (O, po, wr) is sequentially consistent 
(SC, for short) if there exists a store order ww such that po U wr U ww U rw is 
acyclic. The read-write relation rw is defined by rw = wr~! oww (where o denotes 
the standard relation composition). 

The definition of TSO relies on three additional relations: (1) the ppo relation 
which excludes from the program order pairs formed of a write and respectively, 
a read operation, i.e., ppo = po \ (W(O) x R(O)), (2) the po-loc relation which 
is a restriction of po to operations accessing the same variable, i.e., po-loc = 
pon{(o,o’) | var(o) = var(o’)}, and (3) the write-read external relation wre which 
is a restriction of the write-read relation to pairs of operations in different threads 
(not related by program order), i.e., wre = wrM{(o0, 0’) | (0, 0’) Z po and (0',0) € 
po}. Then, we say that a history satisfies TSO if there exists a store order ww 
such that po-loc U wre U ww U rw and ppo U wre U ww U rw are both acyclic. 

Notice that the formal definition of the TSO given above is equivalent to the 
formal operational model of TSO that consists in considering that each thread 
has a store buffer, and then, each write issued by a thread is first sent to its 
store buffer before being committed to the memory later in a nondeterministic 
way. To read a value on some variable x, a thread first checks if it there is still 
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a write on x pending in its own buffer and in this case it takes the value of the 
last such as write, otherwise it fetches the value of x in the memory. 


3 Checking Sequential Consistency 


We define an algorithm for checking whether a history satisfies SC which enforces 
a polynomially-time checkable criterion weaker than SC, a variation of causal 
consistency, in order to construct a partial store order, i.e., one in which not 
all the writes on the same variable are ordered. This partial store order is then 
completed until it orders every two writes on the same variable using a standard 
backtracking enumeration. This approach is efficient when the number of writes 
that remain to be ordered using the backtracking enumeration is relatively small, 
a hypothesis confirmed by our experimental evaluation (see Sect. 5.). 

The variation of causal consistency mentioned above, called convergent causal 
memory (CCM, for short), is stronger than existing variations [6] while still being 
polynomially-time checkable (and weaker than SC). Its definition uses several 
relations between read and write operations which are analogous or even exactly 
the same relations used to define those variations. Section 3.1 recalls the existing 
notions of causal consistency as they are defined in [6] (using the so called “bad- 
pattern” characterization introduced in that paper), Sect. 3.2 introduces CCM, 
while Sect. 3.3 presents our algorithm for checking SC. 


3.1 Causal Consistency 


The weakest variation of causal consistency, called weak causal consistency (CC, 
for short), requires that any two causally-dependent values are observed in the 
same order by all threads, where causally-dependent means that either those 
values were written by the same thread (i.e., the corresponding writes are ordered 
by po), or that one value was written by a thread after reading the other value, 
or any transitive composition of such dependencies. Values written concurrently 
by two threads can be observed in any order, and even-more, this order may 
change in time. A history (O,po,wr) satisfies CC if po U wr U rw(co] is acyclic 
where co = (poUwr)* is called the causal relation. The read-write relation rw(co] 
induced by the causal relation is defined by 


(read(a, v), write(x, v’)) € rw[co] iff (write(x, v), write(x, v’)) € co and 


(write(x, v), read(x,v)) € wr, for some write(z, v) 


The read-write relation rw(co] is a variation of rw from the definition of 
SC/TSO where the store order ww is replaced by the projection of co on pairs 
of writes. In general, given a binary relation R on operations, Rww denotes the 
projection of R on pairs of writes on the same variable. Then, 


Definition 2. The read-write relation rw[R] induced by a relation R is defined 
by rw[R] = wr™t} o Rww. 
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Causal convergence (CCv, for short) is a strengthening of CC where concur- 
rent values are required to be observed in the same order by all threads. 

A history (O, po, wr) satisfies CCv if it satisfies CC and poUwr Ucf is acyclic 
where the conflict relation cf is defined by 


(write(a, v), write(a, v’)) € cf iff (write(x, v), read(x,v’)) € co and 


(write(a, v’), read (x, v’)) € wr, for some read(x, v’) 


The conflict relation relates two writes w; and wz when w; is causally related 
to a read taking its value from w2. The definition of CCM, our new variation 
of causal consistency, relies on a generalization of the conflict relation where a 
different relation is used instead of co. Given a binary relation R on operations, 
Rwpg denotes the projection of R on pairs of writes and reads on the same 
variable, respectively. 


Definition 3. The conflict relation cf|R] induced by a relation R is defined by 
cf[R] = Rwp o wrat. 


to: ti: to: ti: 
write(a, 1) write (a, 2) write(z, 1) write (a, 2) 
read (zx, 2) read (a, 1) write (x, 1) read(z, 0) 
write(y, 1) read(y, 1) 
(a) CM but not CCv nor wCCM read (x, 2) 


(b) CCv, wCCM and TSO but not CM 


to: ti: to: ti: 
write(z, 1) write(y, 1) write(z, 1) write (x, 2) 
write (a, 2) write(y, 2) read (y, 0) read (y, 0) 
read(y, 1) read (y, 2) write(y, 1) write(y, 2) 
read (a, 1) read (a, 1) read (z, 2) 


(c) CM and CCv but not CCM (d) CCM but not SC 


Fig. 1. Histories with two threads used to compare different consistency models. Oper- 
ations of the same thread are aligned vertically. 


Finally, causal memory (CM, for short) is a strengthening of CC where 
roughly, concurrent values are required to be observed in the same order by 
a thread during its entire execution. Differently from CCv, this order can differ 
from one thread to another. Although this intuitive description seems to imply 
that CM is weaker than CCv, the two models are actually incomparable. For 
instance, the history in Fig. 1a is allowed by CM, but not by CCv. It is not 
allowed by CCv because reading 1 from x in the first thread implies that it 
observed write(x, 1) after write(x,2) while reading 2 from z in the second thread 
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implies that it observed write(x, 2) after write(x, 1). While this is allowed by CM 
where different threads can observe concurrent writes in different orders, it is 
not allowed by CCv. Then, the history in Fig. 1b is CCv but not CM. It is not 
allowed by CM because reading the initial value 0 from z implies that write(z, 1) 
is observed after write(x,2) while reading 2 from x implies that write(x,2) is 
observed after write(a, 1) (write(x, 1) must have been observed because the same 
thread reads 1 from y and the writes on x and y are causally related). However, 
under CCv, a thread simply reads the most recent value on each variable and the 
order in which these values are ordered using timestamps for instance is inde- 
pendent of the order in which variables are read in a thread, e.g., reading 0 from 
z doesn’t imply that the timestamp of write(z,2) is smaller than the timestamp 
of write(z, 1). This history is admitted by CCv assuming that the order in which 
write(x, 1) and write(x,2) are observed is write(x, 1) before write(«, 2). 

Let us give the formal definition of CM. Let h=(O, po, wr) be a history. For 
every operation o in h, let hb, be the smallest transitive relation such that: 


1. if two operations are causally related, and each one causally related to o, then 
they are related by hbo, i.e., (01,02) € hb, if (01,02) € co, (01,0) € co, and 
(02,0) € co* (where co* is the reflexive closure of co), and 

2. two writes wı and wz are related by hb, if w 1 is hb,-related to a read taking 
its value from w2, and that read is done by the same thread executing o 
and before o (this scenario is similar to the definition of the conflict relation 
above), i.e., (write(x, v), write(x, v’)) € hb, if (write(x, v), read(x,v’)) € hbo, 
(write(x, v’), read (x, v')) € wr, and (read(x, v’), 0) € po*, for some read(z, v’). 


A history (O, po, wr) satisfies CM if it satisfies CC and for each operation o 
in the history, the relation hb, is acyclic. 

Bouajjani et al. [6] show that the problem of checking whether a history 
satisfies CC, CCv, or CM is polynomial time. This result is a straightforward 
consequence of the above definitions, since the union of relations required to be 
acyclic can be computed in polynomial time from the relations po and wr which 
are fixed in a given history. In particular, the union of these relations can be 
computed by a DATALOG program. 


3.2 Convergent Causal Memory 


We define a new variation of causal consistency which builds on causal memory, 
but similar to causal convergence it enforces that all threads agree on an order in 
which to observe values written by concurrent (causally-unrelated) writes, and 
also, it uses a larger read-write relation. A history (O, po, wr) satisfies convergent 
causal memory (CCM, for short) if po U wr U pww U rw[pww] is acyclic, where 
the partial store order pww is defined by 


pww = (hbww U cf{hb])* with hb = ( (J hbo)”. 
o€O 


The partial store order pww contains the ordering constraints between writes in 
all relations hb, used to defined causal memory, and also, the conflict relation 
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induced by this set of constraints (a weaker version of conflict relation was used 
to define causal convergence). 

As a first result, we show that all the variations of causal consistency in 
Sect. 3.1, i.e., CC, CCv and CM, are strictly weaker than CCM. 


Lemma 1. If a history satisfies CCM, then it satisfies CC, CCu and CM. 


Proof. Let h = (O, po, wr) be a history satisfying CCM. By the definition of hb, 
we have that coww C hbyw. Indeed, any two writes o and 02 related by co are 
also related by hb,,, which by the definition of hb, implies that they are related 
by hbww. Then, by the definition of pww, we have that hbww C pww. This 
implies that rw[co] C rw[pww] (by definition, rw[co] = rw[coww]). Therefore, the 
acyclicity of po U wr U pww U rw[pww] implies that its subset (po U wr U rw{co] is 
also acyclic, which means that h satisfies CC. Also, it implies that poUwr Ucf[hb] 
is acyclic (the last term of the union is included in pww), which by co C hb, 
implies that po U wr U cf[co] is acyclic, and thus, h satisfies CCv. The fact that 
h satisfies CM follows from the fact that h satisfies CC (since poUwr is acyclic) 
and hb is acyclic (hbyw is included in pww and the rest of the dependencies in 
hb are included in poUwr). 


The reverse of the above lemma doesn’t hold. Figure 1c shows a history which 
satisfies CM and CCv, but it is not CCM. To show that this history does not 
satisfy CCM we use the fact that pww relates any two writes which are ordered 
by program order. Then, we get that read(x,1) and write(x,2) are related by 
rw[pww] (because write(x, 1) is related by write-read with read(x,1)), which fur- 
ther implies that (read(x, 1), read(y, 1)) € rw[pww] o po. Similarly, we have that 
(read(y, 1), read(x,1)) € rw[pww]opo, which implies that poUwrU pww Urw|pww] 
is not acyclic, and therefore, the history does not satisfy CCM. The fact that 
this history satisfies CM and CCv follows easily from definitions. 

Next, we show that CCM is weaker than SC, which will be important in our 
algorithm for checking whether a history satisfies SC. 


Lemma 2. If a history satisfies SC, then it satisfies CCM. 


Proof. Using the definition of CCM, Let h = (O, po,wr) be a history satisfying 
SC. Then, there exists a store order ww such that poUwrUwwUrw|ww] is acyclic. 
We show that the two relations hbww and cf[hb], whose union constitutes pww, 
are both included in ww. We first prove that hb C (po U wr U ww U rw[ww])* by 
structural induction on the definition of hb,: 


1. if (01,02) € co = (poUwr)*, then clearly, (01,02) E€ (poU wr UwwUrw|ww]))*, 

2. if (write(x, v), read(x,v’)) € (po U wr U ww Urw[ww])* and there is read(z, v’) 
such that (write(x, v’), read(z,v’)) E€ wr, then (write(x, v), write(x, v’)) € ww. 
Otherwise, assuming by contradiction that (write(a, v’), write(a,v)) E€ ww, we 
get that (read(«, v’), write(x,v)) € rw[ww] (by the definition of rw[ww] using 
the hypothesis (write(x, v’), read(x,v’)) € wr). Note that the latter implies 
that po U wr U ww U rw|ww] is cyclic. 
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Since hb C (po U wr U ww U rw[ww])*, we get that hbww C ww. Also, since 
cf[(po U wr U ww U rw[ww])t] C (po U wr U ww U rw[ww))* (using a similar 
argument as in point (2) above), we get that cf[hb] C (po U wr U ww Urw|[ww)) > 

Finally, since pww C ww, we get that (po U wr U pww U rw[pww])* C (po 
wr U ww U rw[ww])*, which implies that the acyclicity of the latter implies the 
acyclicity of the former. Therefore, h satisfies CCM. 


C: 


The reverse of the above lemma doesn’t hold. For instance, the history in 
Fig. 1d is not SC but it is CCM. This history admits a partial store order pww 
where the writes in different threads are not ordered. 


Fig. 2. Relationships between consistency models. Directed arrows denote the “weaker- 
than” relation while dashed lines connect incomparable models. 


The left side of Fig. 2 (ignoring wCCM and TSO) summarizes the relation- 
ships between the consistency models presented in this section. 

The partial store order pww can be computed in polynomial time (in the size 
of the input history). Indeed, the hbo relations can be computed using a least 
fixpoint calculation that converges in at most a quadratic number of iterations 
and acyclicity can be decided in polynomial time. Therefore, 


Theorem 1. Checking whether a history satisfies CCM is polynomial time in 
the size of the history. 


3.3 An Algorithm for Checking Sequential Consistency 


Algorithm 1 checks whether a given history satisfies sequential consistency. As 
a first step, it checks whether the given history satisfies CCM. If this is not 
the case, then, by Lemma 2, the history does not satisfy SC as well, and the 
algorithm returns false. Otherwise, it enumerates store orders which extend the 
partial store order pww, until finding one that witnesses for satisfaction of SC. 
The history is a violation to SC iff no such store order is found. The soundness 
of this last step is implied by the proof of Lemma 2, which shows that pww is 
included in any store order ww witnessing for SC satisfaction. 


Theorem 2. Algorithm 1 returns true iff the input history h satisfies SC. 
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Input: A history h = (O, po, wr) 
Output: true iff h satisfies SC 


if po U wr U pww U rw[pww] is cyclic then 
| return false; 
end 
foreach ww > pww do 
if po U wr U ww Urw[ww] is acyclic then 
| return true; 
end 
end 
return false; 


OM NOAUA wn 


Algorithm 1. Checking SC conformance. 


4 Checking Conformance to the TSO Model 


We consider now the problem of checking whether a history satisfies TSO. Follow- 
ing the approach developed above for SC, we define a polynomial time checkable 
criterion, based on a (different) variation of causal consistency that is suitable 
for the case of TSO. This allows to reduce the number of pairs of writes for 
which an order must be guessed in order to establish conformance to TSO. 

The case of TSO requires the definition of a new intermediary consistency 
model because CCM is based on a causality order that includes the program 
order po which is relaxed in the context of TSO, compared to the SC model. 
Indeed, CCM is not weaker than TSO as shown by the history in Fig. 1b (note 
that this does not imply that other variations of causal consistency, CC and CCv, 
are also not weaker than TSO). This history satisfies TSO because, based on its 
operational model, the operation write(x, 2) of thread tı can be delayed (pending 
in the store buffer of tı) until the end of the execution. Therefore, after executing 
read(z,0), all the writes of thread tp are committed to the main memory so that 
thread tı can read 1 from y and 2 from x (it is obliged to read the value of x 
from its own store buffer). This history is not admitted by CCM because it is 
not admitted by the weaker causal consistency variation CM. Figure 3 shows a 
history admitted by CCM but not by TSO. Indeed, under TSO, both tz and ts 
should see the writes on x and y performed by to and tı, respectively, in the 
same order. This is not the case, because tz “observes” the write on x before the 
write on y (since it reads 0 from y) and t3 “observes” the write on y before the 
write on x (since it reads 0 from x). This history is admitted by CCM because 
the two writes are causally independent and they concern different variables. We 
mention that TSO and CM are also incomparable. For instance, the history in 
Fig. la is allowed by CM, but not by TSO. The history in Fig. 1b is admitted by 
TSO, but not by CM. 

Next, we define a weakening of CCM, called weak convergent causal memory 
(wCCM), which is also weaker than TSO. The model wCCM is based on causality 
relations induced by the relaxed program orders ppo and po-loc instead of po, 
and the external write-read relation instead of the full write-read relation. 
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to: th: ta: t3: 
write(x, 1) write(y, 1) read (a, 1) read(y, 1) 
read(y, 0) read (a, 0) 


Fig. 3. A history admitted by wCCM and CCM but not by TSO. 


4.1 Weak Convergent Causal Memory 


First, we define two causality relations relative to the partial program orders in 
the definition of TSO and the external write-read relation: For m € {ppo, po-loc}, 
let co” = (m U wre)*. We also consider a notion of conflict that is defined in 
terms of the external write-read relation as follows: For a given relation R, let 
cf. [R] = Rwr o wrz +. 

Then, given a history (O, po, wr), we define for each operation o two happens- 
before relations hb??° and hb?®'°*. The definition of these relations is similar to 
the one of hb, (from causal memory), the differences being that po is replaced 
by ppo and po-loc respectively, co is replaced by coPP° and coP*'°* respectively, 
and wr is replaced by wre. Therefore, for 7 € {ppo, po-loc}, hb? is is the smallest 
transitive relation such that: 


1. (01,02) € hb? if (01,02) € co”, (01,0) € co”, and (02,0) € (co™)*, and 
2. (write(x, v), write(a,v’)) € hb” if (write(x, v), read(x,v’)) € hb*, and 
(write(x, v’), read (x, v')) € wr and (read(z,v’),0) € m*, for some read(z, v’). 


Let hb” = (U co hb3)*, for m € {ppo, po-loc}, and let whb = (hbfP° U 
Abeer, Then, the weak partial store order is defined as follows: 


wpww = (whbww U cfe[hb??°"] U cfe[hbPP?])* 


Then, we say that a history (O, po, wr) satisfies weak convergent causal memory 
(wCCM) if both relations: 


ppo U wre U wpww U rw[wpww] and po-loc U wre U wpww U rw[wpww] 


are acyclic. 
Lemma 3. If a history satisfies TSO, then it satisfies wCCM. 


Proof. Let h = (O, po, wr) be a history satisfying TSO. Then, there exists a 
store order ww such that po-loc U wre U ww U rw and ppo U wre U ww U rw are 
both acyclic. The fact that 


hbP™'®: C (po-loc U wre U ww U rw)* and hb??? C (ppo U wre U ww U rw)* 


can be proved by structural induction like in the case of SC (the step of the 
proof showing that hb C poU wr U ww U rw[ww]). Then, since ww is a total order 
on writes on the same variable, we get that the projection of whb (the transitive 
closure of the union of hb”®™'°® and hbPP°) on pairs of writes on the same variable 
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is included in ww. Therefore, whbww C ww. Then, since cfe| R7] C R” for each 
R” = (m Uwre UwwUrw)* with a € {ppo, po-loc} and since each cf.[R”] relates 
only writes on the same variable, we get that each cf,[R7] is included in ww. 
This implies that wpww C ww. 

Finally, since wpww C ww, we get that (m U wr U wpww U rwlwpww])* C 
(7 U wr U ww U rw[ww])", for each m € {ppo, po-loc}. In each case, the acyclicity 
of the latter implies the acyclicity of the former. Therefore, h satisfies wCCM. 


Input: A history h = (O, po, wr) 
Output: true iff h satisfies TSO 


if ppo U wre U wpww U rw[wpww] or po-locU wre U pww U rw[wpww] is cyclic then 
| return false; 
end 
foreach ww D wpww do 
if ppo U wre U ww U rw[ww] and po-loc U wre U ww U rw[ww] are acyclic then 
| return true; 
end 
end 
return false; 


oo NaN AGOUNBE 


Algorithm 2. Checking TSO conformance. 


The reverse of the above lemma does not hold. Indeed, it can be easily seen 
that wCCM is weaker than CCM (since wpww is included in pww) and the history 
in Fig. 3, which satisfies CCM but not TSO (as explained in the beginning of the 
section), is also an example of a history that satisfies wCCM but not TSO. Then, 
wCCM is incomparable to CM. For instance, the history in Fig. 1b is allowed by 
wCCM (since it is allowed by TSO as explained in the beginning of the section) 
but not by CM. Also, since CCM is stronger than CM, the history in Fig.3 
satisfies CM but not wCCM (since it does not satisfy TSO). These relationships 
are summarized in Fig. 2. Establishing the precise relation between CC/CCv and 
TSO is hard because they are defined using one, resp., two, acyclicity conditions. 
We believe that CC and CCv are weaker than TSO, but we don’t have a formal 
proof. 

Finally, it can be seen that, similarly to pww, the weak partial store order 
wpww can be computed in polynomial time, and therefore: 


Theorem 3. Checking whether a history satisfies wCCM is polynomial time in 
the size of the history. 
4.2 An Algorithm for Checking TSO Conformance 


The algorithm for checking TSO conformance for a given history is given in 
Fig. 2. It starts by checking whether the history violates the weaker consistency 
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model wCCM. If yes, it returns false. If not, it starts enumerating the orders 
between the writes that are not related by the weak partial store order wpww 
until it founds one that allows establishing TSO conformance and in this case it 
returns true. Otherwise it returns false. 


Theorem 4. Algorithm 2 returns true iff the input history h satisfies TSO. 


5 Experimental Evaluation 


To demonstrate the practical value of the theory developed in the previous sec- 
tions, we argue that our algorithms are efficient and scalable. We experiment 
with both SC and TSO algorithms, investigating their running time compared 
to a standard encoding of these models into boolean satisfiability on a bench- 
mark obtained by running realistic cache coherence protocols within the Gem5 
simulator [5] in system emulation mode. 

Histories are generated with random clients of the following cache coher- 
ence protocols included in the Gem5 distribution: MI, MEOSI HAMMER, 
MESI Two LEVEL, and MEOSI AMD Base. The randomization process is 
parametrized by the number of cpus (threads) and the total number of read- 
/write operations. We ensure that every value is written at most once. 

We have compared two variations of our algorithms for checking SC/TSO with 
a standard encoding of SC/TSO into boolean satisfiability (named X-SAT where 
X is SC or TSO). The two variations differ in the way in which the partial store 
order pww dictated by CCM is completed to a total store order ww as required 
by SC/TSO: either using standard enumeration (named X-CCM-+ENvuM where 
X is SC or TSO) or using a SAT solver (named X-CCM+SAT where X is SC or 
TSO). 

The computation of the partial store order pww is done using an encoding of 
its definition into a DATALOG program. The inductive definition of hb, supports 
an easy translation to DATALOG rules, and the same holds for the union of two 
relations, or their composition. We used Clingo [19] to run DATALOG programs. 


5.1 Checking SC 


Figure 4 reports on the running time of the three algorithms while increasing the 
number of operations or cpus. All the histories considered in this experiment sat- 
isfy SC. This is intended because valid histories force our algorithms to enumerate 
extensions of the partial store order (SC violations may be detected while check- 
ing CCM). The graph on the left pictures the evolution of the running time when 
increasing the number of operations from 100 to 500, in increments of 100 (while 
using a constant number of 4 cpus). For each number of operations, we have con- 
sidered 200 histories and computed the average running time. The graph on the 
right shows the running time when increasing the number of cpus from 2 to 6, in 
increments of 1. For x cpus, we have limited the number of operations to 50x. As 
before for each number of cpus, we have considered 200 histories and computed 
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Duration (s) 
buu 
Duration (s) 


100 200 300 400 500 2 3 4 5 6 
Number of operations per trace Number of threads per trace 


WSC-CCM+Enum M SC-CCM+SAT M SC-SAT MSCCCM+Enum M SC-CCM+SAT E SC-SAT 


(a) Checking SC while varying the (b) Checking SC while varying the 
number of operations. number of cpus. 


Fig. 4. Checking SC for valid histories. 


the average running time. As it can be observed, our algorithms scale much bet- 
ter than the SAT encoding and interestingly enough, the difference between an 
explicit enumeration of pww extensions and one using a SAT solver is not signif- 
icant. Note that even small improvements on the average running time provide 
large speedups when taking into account the whole testing process, i.e., checking 
consistency for a possibly large number of (randomly-generated) executions. For 
instance, the work on McVerSi [13], which focuses on the complementary prob- 
lem of finding clients that increase the probability of uncovering bugs, shows that 
exposing bugs in some realistic cache coherence implementations requires even 24 
h of continuous testing. 

Since the bottleneck in our algorithms is given by the enumeration of pww 
extensions, we have measured the percentage of pairs of writes that are not 
ordered by pww. Thus, we have considered a random sample of 200 histories 
(with 200 operations per history) and evaluated this percentage to be just 6.6%, 
which is surprisingly low. This explains the net gain in comparison to a SAT 
encoding of SC, since the number of pww extensions that need to be enumerated 
is quite low. As a side remark, using CCv instead of CCM in the algorithms 
above leads to a drastic increase in the number of unordered writes. For the 
same random sample of 200 histories, we conclude that using CCv instead of 
CCM leaves 57.75% of unordered writes in average which is considerably bigger 
than the percentage of unordered writes when using CCM. 

We have also evaluated our algorithms on SC violations. These violations 
were generated by reordering statements from the MI implementation, e.g., swap- 
ping the order of the actions s_store_hit and p-profileHit in the transition 
transition(M, Store). As an optimization, our implementation checks grad- 
ually the weaker variations of causal consistency CC and CCv before checking 
CCM. This is to increase the chances of returning in the case of a violation (a vio- 
lation to CC/CCv is also a violation to CCM and SC). We have considered 1000 
histories with 100 to 400 operations and 2 to 8 cpus, equally distributed in function 


Gradual Consistency Checking 281 


Duration (s) 


2 4 6 2 
Number of threads per trace 


m 5c-CCM+Enum = MISC-CCM4SAT I SC-SAT 


Fig. 5. Checking SC for invalid histories while increasing the number of cpus. 
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Fig. 6. Checking TSO for valid histories. 


of the number of cpus. Figure 5 reports on the evolution of the average running 
time. Since these histories happen to all be CCM violations, SC-CCM+ENUM 
and SC-CCM+SAT have the same running time. As an evaluation of our opti- 
mization, we have found that 50% of the histories invalidate weaker variations of 
causal consistency, CC or CCv. 


5.2 Checking TSO 


We have evaluated our TSO algorithms on the same set of histories used for SC 
in Fig. 4. Since these histories satisfy SC, they satisfy TSO as well. As in the case 
of SC, our algorithms scale better than the SAT encoding. However, differently 
from SC, the enumeration of wpww extensions using a SAT solver outperforms 
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the explicit enumeration. Since this difference was more negligible in the case of 
SC, it seems that the SAT variation is generally better. 


6 Related Work 


While several static techniques have been developed to prove that a shared- 
memory implementation (or cache coherence protocol) satisfies SC [1,4, 9-12, 17, 
20,23, 27,28] few have addressed dynamic techniques such as testing and runtime 
verification (which scale to more realistic implementations). From the complexity 
standpoint, Gibbons and Korach [21] showed that checking whether a history 
is SC is NP-hard while Alur et al. [4] showed that checking SC for finite-state 
shared-memory implementations (over a bounded number of threads, variables, 
and values) is undecidable [4]. The fact that checking whether a history satisfies 
TSO is also NP-hard has been proved by Furbach et al. [18]. 

There are several works that addressed the testing problem for related cri- 
teria, e.g., linearizability. While SC requires that the operations in a history 
be explained by a linearization that is consistent with the program order, lin- 
earizability requires that such a linearization be also consistent with the real- 
time order between operations (linearizability is stronger than SC). The works 
in [25,30] describe monitors for checking linearizability that construct lineariza- 
tions of a given history incrementally, in an online fashion. This incremental con- 
struction cannot be adapted to SC since it strongly relies on the specificities of 
linearizability. Line-Up [8] performs systematic concurrency testing via schedule 
enumeration, and offline linearizability checking via linearization enumeration. 
The works in [15,16] show that checking linearizability for some particular class 
of ADTs is polynomial time. Emmi and Enea [14] consider the problem of check- 
ing weak consistency criteria, but their approach focuses on specific relaxations 
in those criteria, falling back to an explicit enumeration of linearizations in the 
context of a criterion like SC or TSO. Bouajjani et al. [6] consider the problem 
of checking causal consistency. They formalize the different variations of causal 
consistency we consider in this work and show that the problem of checking 
whether a history satisfies one of these variations is polynomial time. 

The complementary issue of test generation, i.e., finding clients that increase 
the probability of uncovering bugs in shared memory implementations, has been 
approached in the McVerSi framework [13]. Their methodology for checking a 
criterion like SC lies within the context of white-box testing, i.e., the user is 
required to annotate the shared memory implementation with events that define 
the store order in an execution. Our algorithms have the advantage that the 
implementation is treated as a black-box requiring less user intervention. 


7 Conclusion 


We have introduced an approach for checking the conformance of a computation 
to SC or to TSO, a problem known to be NP-hard. The idea is to avoid an explicit 
enumeration of the exponential number of possible total orders between writes in 
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order to solve these problems. Our approach is to define weaker criteria that are 
as strong as possible but still polynomial time checkable. This is useful for (1) 
early detection of violations, and (2) reducing the number of pairs of writes for 
which an order must be found in order to check SC/TSO conformance. Morally, 
the approach consists in being able to capture an “as large as possible” partial 
order on writes that can be computed in polynomial time (using a least fixpoint 
calculation), and which is a subset of any total order witnessing SC/TSO con- 
formance. Our experimental results show that this approach is indeed useful and 
performant: it allows to catch most of violations early using an efficient check, 
and it allows to compute a large kernel of write constraints that reduces signifi- 
cantly the number of pairs of writes that are left to be ordered in an enumerative 
way. Future work consists in exploring the application of this approach to other 
correctness criteria that are hard to check such a serializability in the context of 
transactional programs. 
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Abstract. Transactional access to databases is an important abstrac- 
tion allowing programmers to consider blocks of actions (transactions) as 
executing in isolation. The strongest consistency model is serializability, 
which ensures the atomicity abstraction of transactions executing over 
a sequentially consistent memory. Since ensuring serializability carries a 
significant penalty on availability, modern databases provide weaker con- 
sistency models, one of the most prominent being snapshot isolation. In 
general, the correctness of a program relying on serializable transactions 
may be broken when using weaker models. However, certain programs 
may also be insensitive to consistency relaxations, i.e., all their properties 
holding under serializability are preserved even when they are executed 
over a weak consistent database and without additional synchronization. 

In this paper, we address the issue of verifying if a given program is 
robust against snapshot isolation, i.e., all its behaviors are serializable 
even if it is executed over a database ensuring snapshot isolation. We 
show that this verification problem is polynomial time reducible to a 
state reachability problem in transactional programs over a sequentially 
consistent shared memory. This reduction opens the door to the reuse of 
the classic verification technology for reasoning about weakly-consistent 
programs. In particular, we show that it can be used to derive a proof 
technique based on Lipton’s reduction theory that allows to prove pro- 
grams robust. 


1 Introduction 


Transactions simplify concurrent programming by enabling computations on 
shared data that are isolated from other concurrent computations and resilient to 
failures. Modern databases provide transactions in various forms corresponding 
to different tradeoffs between consistency and availability. The strongest con- 
sistency level is achieved with serializable transactions [21] whose outcome in 
concurrent executions is the same as if the transactions were executed atomi- 
cally in some order. Since serializability carries a significant penalty on avail- 
ability, modern databases often provide weaker consistency models, one of the 
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most prominent being snapshot isolation (SI) [5]. Then, an important issue is to 
ensure that the level of consistency needed by a given program coincides with 
the one that is guaranteed by its infrastructure, i.e., the database it uses. One 
way to tackle this issue is to investigate the problem of checking robustness of 
programs against consistency relaxations: Given a program P and two consis- 
tency models S and W such that S is stronger than W, we say that P is robust 
for S against W if for every two implementations Is and Iw of S and W respec- 
tively, the set of computations of P when running with Is is the same as its set 
of computations when running with Jy. This means that P is not sensitive to 
the consistency relaxation from S to W, and therefore it is possible to reason 
about the behaviors of P assuming that it is running over S, and no additional 
synchronization is required when P runs over the weak model W such that it 
maintains all its properties satisfied with S. 

In this paper, we address the problem of verifying robustness of transactional 
programs for serializability, against snapshot isolation. Under snapshot isolation, 
any transaction t reads values from a snapshot of the database taken at its start 
and t can commit only if no other committed transaction has written to a loca- 
tion that t wrote to, since t started. Robustness is a form of program equivalence 
between two versions of the same program, obtained using two semantics, one 
more permissive than the other. It ensures that this permissiveness has no effect 
on the program under consideration. The difficulty in checking robustness is to 
apprehend the extra behaviors due to the relaxed model w.r.t. the strong model. 
This requires a priori reasoning about complex order constraints between opera- 
tions in arbitrarily long computations, which may need maintaining unbounded 
ordered structures, and make robustness checking hard or even undecidable. 

Our first contribution is to show that verifying robustness of transac- 
tional programs against snapshot isolation can be reduced in polynomial time 
to the reachability problem in concurrent programs under sequential consis- 
tency (SC). This allows (1) to avoid explicit handling of the snapshots from 
where transactions read along computations (since this may imply memorizing 
unbounded information), and (2) to leverage available tools for verifying invari- 
ants/reachability problems on concurrent programs. This also implies that the 
robustness problem is decidable for finite-state programs, PSPACE-complete 
when the number of sites is fixed, and EXPSPACE-complete otherwise. This is 
the first result on the decidability and complexity of the problem of verifying 
robustness in the context of transactional programs. The problem of verifying 
robustness has been considered in the literature for several models, including 
eventual and causal consistency [6,10-12,20]. These works provide (over- or 
under-)approximate analyses for checking robustness, but none of them pro- 
vides precise (sound and complete) algorithmic verification methods for solving 
this problem. 

Based on this reduction, our second contribution is a proof methodology 
for establishing robustness which builds on Lipton’s reduction theory [18]. We 
use the theory of movers to establish whether the relaxations allowed by SI are 
harmless, i.e., they don’t introduce new behaviors compared to serializability. 
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We applied the proposed verification techniques on 10 challenging applica- 
tions extracted from previous work [2,6,11,14,16, 19,24]. These techniques were 
enough for proving or disproving the robustness of these applications. 

Complete proofs and more details can be found in [4]. 


conflict 
pi: p2: 
ti: [rl=y //0 || t2: [r2 =x // 
zei J=] bl=yx=]] p2=x y=] 
(a) Write Skew (WS). (b) A WS execution trace. 


Fig. 1. Examples of non-robust programs illustrating the difference between SI and 
serializability. causal dependency means that a read in a transaction obtains its value 
from a write in another transaction. conflict means that a write in a transaction is not 
visible to a read in another transaction, but it would affect the read value if it were 
visible. Here, happens-before is the union of the two. 


2 Overview 


In this section, we give an overview of our approach for checking robustness 
against snapshot isolation. While serializability enforces that transactions are 
atomic and conflicting transactions, i.e., which read or write to a common loca- 
tion, cannot commit concurrently, SI [5] allows that conflicting transactions 
commit in parallel as long as they don’t contain a write-write conflict, i.e., write 
on a common location. Moreover, under SI, each transaction reads from a snap- 
shot of the database taken at its start. These relaxations permit the “anomaly” 
known as Write Skew (WS) shown in Fig. la, where an anomaly is a program 
execution which is allowed by SI, but not by serializability. The execution of 
Write Skew under SI allows the reads of x and y to return 0 although this 
cannot happen under serializability. These values are possible since each trans- 
action is executed locally (starting from the initial snapshot) without observing 
the writes of the other transaction. 


Execution Trace. Our notion of program robustness is based on an abstract 
representation of executions called trace. Informally, an execution trace is a set 
of events, i.e., accesses to shared variables and transaction begin/commit events, 
along with several standard dependency relations between events recording the 
data-flow. The transitive closure of the union of all these dependency relations 
is called happens-before. An execution is an anomaly if the happens-before of its 
trace is cyclic. Figure 1b shows the happens-before of the Write Skew anomaly. 
Notice that the happens-before order is cyclic in both cases. 

Semantically, every transaction execution involves two main events, the issue 
and the commit. The issue event corresponds to a sequence of reads and/or 
writes where the writes are visible only to the current transaction. We interpret 
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it as a single event since a transaction starts with a database snapshot that it 
updates in isolation, without observing other concurrently executing transac- 
tions. The commit event is where the writes are propagated and made visible to 
all processes. Under serializability, the two events coincide, i.e., they are adjacent 
in the execution. Under SI, this is not the case and in between the issue and the 
commit of the same transaction, we may have issue/commit events from con- 
current transactions. When a transaction commit does not occur immediately 
after its issue, we say that the underlying transaction is delayed. For example, 
the following execution of WS corresponds to the happens-before cycle in Fig. 1b 
where the write to x was committed after t2 finished, hence, tı was delayed: 


begin(p1, tı )ld(p1, tı, y, O)isu(pi, t1, x, 1) com(p1, t1) 
begin(p2, t2)ld(p2, t2, x, O)isu(p2, t2, y, 1)com(p2, t2) 


Above, begin(pı, tı) stands for starting a new transaction tı by process pı, Id 
represents read (load) actions, while isu denotes write actions that are visible only 
to the current transaction (not yet committed). The writes performed during tı 
become visible to all processes once the commit event com(p1, tı) takes place. 


Reducing Robustness to SC Reachability. The above SI execution can be 
mimicked by an execution of the same program under serializability modulo an 
instrumentation that simulates the delayed transaction. The local writes in the 
issue event are simulated by writes to auxiliary registers and the commit event is 
replaced by copying the values from the auxiliary registers to the shared variables 
(actually, it is not necessary to simulate the commit event; we include it here 
for presentation reasons). The auxiliary registers are visible only to the delayed 
transaction. In order that the execution be an anomaly (i.e., not possible under 
serializability without the instrumentation) it is required that the issue and the 
commit events of the delayed transaction are linked by a chain of happens-before 
dependencies. For instance, the above execution for WS can be simulated by: 


begin(p1, t1)Id(p1, t1, y, O)st(p1, t1, rx, 1) st(pi,t1, £, re) 
begin(p2, t2)ld(p2, t2, x, O)isu(pa2, t2, y, 1)com(pa, t2) 


The write to x was delayed by storing the value in the auxiliary register ry and 
the happens-before chain exists because the read on y that was done by tı is 
conflicting with the write on y from tg and the read on x by tg is conflicting 
with the write of x in the simulation of t;’s commit event. On the other hand, 
the following execution of Write-Skew without the read on y in tı: 


begin(p1, t1)st(p1,t1, rx, 1) st(p1,t1,2, Tx) 
begin(p2, t2)ld(p2, t2, x, O)isu(p2, t2, y, 1)com(p2, t2) 


misses the conflict (happens-before dependency) between the issue event of 
tı and t2. Therefore, the events of t2 can be reordered to the left of t; and 
obtain an equivalent execution where st(p1,t1,2,72) occurs immediately after 
st(pi,t1, Tx, 1). In this case, tı is not anymore delayed and this execution is 
possible under serializability (without the instrumentation). 
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If the number of transactions to be delayed in order to expose an anomaly 
is unbounded, the instrumentation described above may need an unbounded 
number of auxiliary registers. This would make the verification problem hard 
or even undecidable. However, we show that it is actually enough to delay a 
single transaction, i.e., a program admits an anomaly under SI iff it admits 
an anomaly containing a single delayed transaction. This result implies that 
the number of auxiliary registers needed by the instrumentation is bounded 
by the number of program variables, and that checking robustness against SI 
can be reduced in linear time to a reachability problem under serializability 
(the reachability problem encodes the existence of the chain of happens-before 
dependencies mentioned above). The proof of this reduction relies on a non- 
trivial characterization of anomalies. 


Proving Robustness Using Commutativity Dependency Graphs. Based 
on the reduction above, we also devise an approximated method for checking 
robustness based on the concept of mover in Lipton’s reduction theory [18]. An 
event is a left (resp., right) mover if it commutes to the left (resp., right) of 
another event (from a different process) while preserving the computation. We 
use the notion of mover to characterize happens-before dependencies between 
transactions. Roughly, there exists a happens-before dependency between two 
transactions in some execution if one doesn’t commute to the left/right of the 
other one. We define a commutativity dependency graph which summarizes the 
happens-before dependencies in all executions of a given program between trans- 
actions t as they appear in the program, transactions ¢ \ {w} where the writes of 
t are deactivated (i.e., their effects are not visible outside the transaction), and 
transactions t \ {r} where the reads of 

t obtain non-deterministic values. The 
transactions t \ {w} are used to sim- BAC) et 
ulate issue events of delayed transac- ( 

tions (where writes are not yet visible) NG \ tu} 
while the transactions t \ {r} are used 

to simulate commit events of delayed Fig. 2. Commutativity dependency graph 
transactions (which only write to the of WS where the read of y is omitted. 
shared memory). Two transactions a 

and b are linked by an edge iff a cannot move to the right of b (or b cannot move 
to the left of a), or if they are related by the program order (i.e., issued in some 
order in the same process). Then a program is robust if for every transaction t, 
this graph doesn’t contain a path from t \ {w} to t \ {r} formed of transactions 
that don’t write to a variable that t writes to (the latter condition is enforced 
by SI since two concurrent transactions cannot commit at the same time when 
they write to a common variable). For example, Fig. 2 shows the commutativity 
dependency graph of the modified WS program where the read of y is removed 
from tı. The fact that it doesn’t contain any path like above implies that it is 
robust. 
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3 Programs 


A program is parallel composition of processes distinguished using a set of iden- 
tifiers P. Each process is a sequence of transactions and each transaction is a 
sequence of labeled instructions. Each transaction starts with a begin instruc- 
tion and finishes with a commit instruction. Each other instruction is either an 
assignment to a process-local register from a set R or to a shared variable from 
a set V, or an assume statement. The read/write assignments use values from a 
data domain D. An assignment to a register (reg) := (var) is called a read of the 
shared-variable (var) and an assignment to a shared variable (var) := (reg-expr) 
is called a write to (var) ((reg-expr) is an expression over registers whose syn- 
tax we leave unspecified since it is irrelevant for our development). The assume 
(bexpr) blocks the process if the Boolean expression (bexpr) over registers is 
false. They are used to model conditionals as usual. We use goto statements 
to model an arbitrary control-flow where the same label can be assigned to 
multiple instructions and multiple goto statements can direct the control to the 
same label which allows to mimic imperative constructs like loops and condition- 
als. To simplify the technical exposition, our syntax includes simple read/write 
instructions. However, our results apply as well to instructions that include SQL 
(select /update) queries. The experiments reported in Sect. 7 consider programs 
with SQL based transactions. 

The semantics of a program under SI is defined as follows. The shared vari- 
ables are stored in a central memory and each process keeps a replicated copy 
of the central memory. A process starts a transaction by discarding its local 
copy and fetching the values of the shared variables from the central memory. 
When a process commits a transaction, it merges its local copy of the shared 
variables with the one stored in the central memory in order to make its updates 
visible to all processes. During the execution of a transaction, the process stores 
the writes to shared variables only in its local copy and reads only from its 
local copy. When a process merges its local copy with the centralized one, it is 
required that there were no concurrent updates that occurred after the last fetch 
from the central memory to a shared variable that was updated by the current 
transaction. Otherwise, the transaction is aborted and its effects discarded. 

More precisely, the semantics of a program P under SI is defined as a labeled 
transition system [P]s: where transactions are labeled by the set of events 


iv = {begin(p, t), Id(p, t, x, v), isu(p, t, £, v), com(p, t) : p € P,t € T?,2 € V,v € D} 


where begin and com label transitions corresponding to the start and the com- 
mit of a transaction, respectively. isu and Id label transitions corresponding to 
writing, resp., reading, a shared variable during some transaction. 

An execution of program P, under snapshot isolation, is a sequence of events 
evı - evz- ... corresponding to a run of [P]cy. The set of executions of P under 
SI is denoted by Exg1(P). 
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4 Robustness Against SI 


A trace abstracts the order in which shared-variables are accessed inside a trans- 
action and the order between transactions accessing different variables. Formally, 
the trace of an execution p is obtained by (1) replacing each sub-sequence of 
transitions in p corresponding to the same transaction, but excluding the com 
transition, with a single “macro-event” isu(p, t), and (2) adding several standard 
relations between these macro-events isu(p,t) and commit events com(p,t) to 
record the data-flow in p, e.g. which transaction wrote the value read by another 
transaction. The sequence of isu(p,t) and com(p,t) events obtained in the first 
step is called a summary of p. We say that a transaction t in p performs an 
external read of a variable x if p contains an event Id(p,t,x,v) which is not 
preceded by a write on x of t, i.e., an event isu(p,t,x,v). Also, we say that a 
transaction t writes a variable x if p contains an event isu(p, t, x, v), for some v. 

The trace tr(p) = (7,PO, WR, WW, RW, STO) of an execution p consists of 
the summary T of p along with the program order PO, which relates any two 
issue events isu(p, t) and isu(p, t’) that occur in this order in 7, write-read relation 
WR (also called read-from), which relates any two events com(p, t) and isu(p’, t’) 
that occur in this order in T such that tł’ performs an external read of x, and 
com(p,t) is the last event in 7 before isu(p’,t’) that writes to x (to mark the 
variable z, we may use WR(z)), the write-write order WW (also called store- 
order), which relates any two store events com(p,t) and com(p’,t’) that occur 
in this order in 7 and write to the same variable x (to mark the variable x, we 
may use WW(z:)), the read-write relation RW (also called conflict), which relates 
any two events isu(p,t) and com(p’,t’) that occur in this order in 7 such that t 
reads a value that is overwritten by t’, and the same-transaction relation STO, 
which relates the issue event with the commit event of the same transaction. The 
read-write relation RW is formally defined as RW(x) = WRT! (£); WW(2) (we 


use ; to denote the standard composition of relations) and RW = (J) RW(z). Ifa 
xey 
transaction ¢ reads the initial value of x then RW(z) relates isu(p, t) to com(p’, t’) 


of any other transaction t which writes to x (i.e., (isu(p, t), com(p’, t’)) E€ RW(z)) 
(note that in the above relations, p and p’ might designate the same process). 

Since we reason about only one trace at a time, to simplify the writing, we 
may say that a trace is simply a sequence T as above, keeping the relations PO, 
WR, WW, RW, and STO implicit. The set of traces of executions of a program 
P under SI is denoted by Tr(P)sr. 


Serializability Semantics. The semantics of a program under serializability 
can be defined using a transition system where the configurations keep a single 
shared-variable valuation (accessed by all processes) with the standard inter- 
pretation of read and write statements. Each transaction executes in isolation. 
Alternatively, the serializability semantics can be defined as a restriction of [P]sr 
to the set of executions where each transaction is immediately delivered when it 
starts, i.e., the start and commit time of transaction coincide t.st = t.ct. Such 
executions are called serializable and the set of serializable executions of a pro- 
gram P is denoted by Exser(P). The latter definition is easier to reason about 
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when relating executions under snapshot isolation and serializability, respec- 
tively. 


Serializable Trace. A trace tr is called serializable if it is the trace of a serial- 
izable execution. Let Trse(P) denote the set of serializable traces. Given a seri- 
alizable trace tr = (T, PO, WR, WW, RW, STO) we have that every event isu(p, t) 
in 7 is immediately followed by the corresponding com(p, t) event. 


Happens Before Order. Since multiple executions may have the same trace, 
it is possible that an execution p produced by snapshot isolation has a serializ- 
able trace tr(p) even though isu(p, t) events may not be immediately followed by 
com(p,t) actions. However, p would be equivalent, up to reordering of “indepen- 
dent” (or commutative) transitions, to a serializable execution. To check whether 
the trace of an execution is serializable, we introduce the happens-before relation 
on the events of a given trace as the transitive closure of the union of all the 
relations in the trace, i.e., HB = (PO U WW U WRU RW USTO)?. 

Finally, the happens-before relation between events is extended to transac- 
tions as follows: a transaction tı happens-before another transaction t2 Æ tı if the 
trace tr contains an event of transaction tı which happens-before an event of to. 
The happens-before relation between transactions is denoted by HB; and called 
transactional happens-before. The following characterizes serializable traces. 


Theorem 1 ((1,23]). A trace tr is serializable iff HB; is acyclic. 


A program is called robust if it produces the same set of traces as the seri- 
alizability semantics. 


Definition 1. A program P is called robust against SI iff Trsı(P) = Trser(P). 


Since Trser(P) C Trx(P), the problem of checking robustness of a program P is 
reduced to checking whether there exists a trace tr € Trgr(P) \ Trser(P). 


5 Reducing Robustness Against SI to SC Reachability 


A trace which is not serializable must contain at least an issue and a commit 
event of the same transaction that don’t occur one after the other even after 
reordering of “independent” events. Thus, there must exist an event that occur 
between the two which is related to both events via the happens-before rela- 
tion, forbidding the issue and commit to be adjacent. Otherwise, we can build 
another trace with the same happens-before where events are reordered such that 
the issue is immediately followed by the corresponding commit. The latter is a 
serializable trace which contradicts the initial assumption. We define a program 
instrumentation which mimics the delay of transactions by doing the writes on 
auxiliary variables which are not visible to other transactions. After the delay of 
a transaction, we track happens-before dependencies until we execute a trans- 
action that does a “read” on one of the variables that the delayed transaction 
writes to (this would expose a read-write dependency to the commit event of 
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the delayed transaction). While tracking happens-before dependencies we can- 
not execute a transaction that writes to a variable that the delayed transaction 
writes to since SI forbids write-write conflicts between concurrent transactions. 

Concretely, given a program P, we define an instrumentation of P such that 
P is not robust against SI iff the instrumentation reaches an error state under 
serializability. The instrumentation uses auxiliary variables in order to simu- 
late a single delayed transaction which we prove that it is enough for deciding 
robustness. Let isu(p, t) be the issue event of the only delayed transaction. The 
process p that delayed t is called the Attacker. When the attacker finishes execut- 
ing the delayed transaction it stops. Other processes that execute transactions 
afterwards are called Happens-Before Helpers. 

The instrumentation uses two copies of the set of shared variables in the 
original program to simulate the delayed transaction. We use primed variables 
x’ to denote the second copy. Thus, when a process becomes the attacker, it will 
only write to the second copy that is not visible to other processes including the 
happens-before helpers. The writes made by the other processes including the 
happens-before helpers are made visible to all processes. 

When the attacker delays the transaction t, it keeps track of the variables it 
accessed, in particular, it stores the name of one of the variables it writes to, x, 
it tracks every variable y that it reads from and every variable z that it writes 
to. When the attacker finishes executing t, and some other process wants to 
execute some other transaction, the underlying transaction must contain a write 
to a variable y that the attacker reads from. Also, the underlying transaction 
must not write to a variable that t writes to. We say that this process has joined 
happens-before helpers through the underlying transaction. While executing this 
transaction, we keep track of each variable that was accessed and the type of 
operation, whether it is a read or write. Afterward, in order for some other trans- 
action to “join” the happens-before path, it must not write to a variable that t 
writes to so it does not violate the fact that SI forbids write-write conflicts, and 
it has to satisfy one of the following conditions in order to ensure the continuity 
of the happens-before dependencies: (1) the transaction is issued by a process 
that has already another transaction in the happens-before dependency (pro- 
gram order dependency), (2) the transaction is reading from a shared variable 
that was updated by a previous transaction in the happens-before dependency 
(write-read dependency), (3) the transaction writes to a shared variable that 
was read by a previous transaction in the happens-before dependency (read- 
write dependency), or (4) the transaction writes to a shared variable that was 
updated by a previous transaction in the happens-before dependency (write- 
write dependency). We introduce a flag for each shared variable to mark the 
fact that the variable was read or written by a previous transaction. 

Processes continue executing transactions as part of the chain of happens- 
before dependencies, until a transaction does a read on the variable x that t 
wrote to. In this case, we reached an error state which signals that we found a 
cycle in the transactional happens-before relation. 
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The instrumentation uses four varieties of flags: a) global flags (i.e., HB, atra, 
asta), b) flags local to a process (i.e., p.a and p.hbh), and c) flags per shared 
variable (i.e., x.event, x.event’, and x.eventI). We will explain the meaning of 
these flags along with the instrumentation. At the start of the execution, all flags 
are initialized to null (1). 

Whether a process is an attacker or happens-before helper is not enforced 
syntactically by the instrumentation. It is set non-deterministically during the 
execution using some additional process-local flags. Each process chooses to set 
to true at most one of the flags p.a and p.hbh, implying that the process becomes 
an attacker or happens-before helper, respectively. At most one process can be 
an attacker, i.e., set p.a to true. In the following, we detail the instrumentation 
for read and write instructions of the attacker and happens-before helpers. 


5.1 Instrumentation of the Attacker 


Figure 3 lists the instrumentation of the write and read instructions of the 
attacker. Each process passes through an initial phase where it executes trans- 
actions that are visible immediately to all the other processes (i.e., they are not 
delayed), and then non-deterministically it can choose to delay a transaction at 
which point it sets the flag atr, to true. During the delayed transaction it chooses 
non-deterministically a write instruction to a variable x and stores the name of 
this variable in the flag ast, (line (5)). The values written during the delayed 
transaction are stored in the primed variables and are visible only to the current 
transaction, in case the transaction reads its own writes. For example, given a 
variable z, all writes to z from the original program are transformed into writes 
to the primed version z’ (line (3)). Each time, the attacker writes to z, it sets 
the flag z.event’ = 1. This flag is used later by transactions from happens-before 
helpers to avoid writing to variables that the delayed transaction writes to. 


[las x := e; goto lo;Ja = 
[luz r := x; goto losJa = // Write before the delayed transaction 
// Read before the delayed transaction l1: assume arr, =L ; goto las 
lai: © = e; goto lo; 
// Write in the delayed transaction 
lı: assume atra AL Ap.a AL ; goto lza: 


lı: assume Gr, =L ; goto lea 3 
lai: T= @; goto lo; 


// Read in the delayed transaction 


l1: assume at, #L Ap.a ZL ; goto lz2; Ino: a := €; goto 123; (3) 

lz2: r := a’; goto lz3; lz3: v.event’ := 1; goto lo; (4) 

lz3: v.event := Id; goto Iza; (1) // Special write in the delayed transaction 

lz4: assume HB =L ; goto lz5; lı: assume Ast, x.event =L A atra L ; goto lz4; 

lz5: HB := true; goto lg; (2) Ina: z’ := e; goto Ins; 

lz4: assume HB #L ; goto l2; lz5: asta := ‘@'; goto lz6; (5) 
lng: x.event’ := 1; goto lo; 


Fig. 3. Instrumentation of the Attacker. We use ‘x’ to denote the name of the shared 
variable z. 
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A read on a variable, y, in the delayed transaction takes her value from 
the primed version, y’. In every read in the delayed transaction, we set the flag 
y.event to Id (line (1)) to be used latter in order for a process to join the happens- 
before helpers. Afterward, the attacker starts the happens-before path, and it 
sets the variable HB to true (line (2)) to mark the start of the happens. When 
the flag HB is set to true the attacker stops executing new transactions. 


5.2 Instrumentation of the Happens-Before Helpers 


The remaining processes, which are not the attacker, can become a happens- 
before helper. Figure 4 lists the instrumentation of write and read instructions of 
a happens-before helper. In a first phase, each process executes the original code 
until the flag atr, is set to true by the attacker. This flag signals the “creation” 
of the secondary copy of the shared-variables, which can be observed only by 
the attacker. At this point, the flag HB is set to true, and the happens-before 
helper process chooses non-deterministically a first transaction through which it 
wants to join the set of happens-before helpers, i.e., continue the happens-before 
dependency created by the existing happens-before helpers. When a process 
chooses a transaction, it makes a pledge (while executing the begin instruction) 
that during this transaction it will either read from a variable that was written to 
by another happens-before helper, write to a variable that was accessed (read or 
written) by another happens-before helper, or write to a variable that was read 
from in the delayed transaction. When the pledge is met, the process sets the 
flag p.hbh to true (lines (7) and (11)). The execution is blocked if a process does 
not keep its pledge (i.e., the flag p.hbh is null) at the end of the transaction. 
Note that the first process to join the happens-before helper has to execute 
a transaction t which writes to a variable that was read from in the delayed 
transaction since this is the only way to build a happens-before between t, and 
the delayed transaction (PO is not possible since ¢ is not from the attacker, WR 
is not possible since t does not see the writes of the delayed transaction, and WW 
is not possible since t cannot write to a variable that the delayed transaction 
writes to). We use a flag x.event for each variable x to record the type (read Id 
or write st) of the last access made by a happens-before helper (lines (8) and 
(10)). During the execution of a transaction that is part of the happens-before 
dependency, we must ensure that the transaction does not write to variable y 
where y.even’ is set to 1. Otherwise, the execution is blocked (line 9). 

The happens-before helpers continue executing their instructions, until one 
of them reads from the shared variable x whose name was stored in as. This 
establishes a happens-before dependency between the delayed transaction and 
a “fictitious” store event corresponding to the delayed transaction that could 
be executed just after this read of x. The execution doesn’t have to contain 
this store event explicitly since it is always enabled. Therefore, at the end of 
every transaction, the instrumentation checks whether the transaction read zx. 
If it is the case, then the execution stops and goes to an error state to indicate 
that this is a robustness violation. Notice that after the attacker stops, the only 
processes that are executing transactions are happens-before helpers, which is 
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justified since when a process is not from a happens-before helper it implies 
that we cannot construct a happens-before dependency between a transaction of 
this process and the delayed transaction which means that the two transactions 
commute which in turn implies that this process’s transactions can be executed 
before executing the delayed transaction of the attacker. 


5.3 Correctness 


The role of a process in an execution is chosen non-deterministically at runtime. 
Therefore, the final instrumentation of a given program P, denoted by [P], is 
obtained by replacing each labeled instruction (linst) with the concatenation 
of the instrumentations corresponding to the attacker and the happens-before 
helpers, i.e., — [(linst)] ::=[(linst)] 4 [(linst)]HbH 

The following theorem states the correctness of the instrumentation. 


Theorem 2. P is not robust against SI iff [P] reaches the error state. 


If a program is not robust, this implies that the execution of the program under 
SI results in a trace where the happens-before is cyclic. Which is possible only 
if the program contains at least one delayed transaction. In the proof of this 
theorem, we show that is sufficient to search for executions that contain a single 
delayed transaction. 

Notice that in the instrumentation of the attacker, the delayed transaction 
must contain a read and write instructions on different variables. Also, the trans- 
actions of the happens-before helpers must not contain a write to a variable that 
the delayed transaction writes to. The following corollary states the complexity 
of checking robustness for finite-state programs! against snapshot isolation. It is 
a direct consequence of Theorem 2 and of previous results concerning the reach- 
ability problem in concurrent programs running over a sequentially-consistent 
memory, with a fixed [17] or parametric number of processes [22]. 


fla: £ := e; goto l2;ļ]HbH = 
[las r := æ; goto lo;]HoH = 

// Write before the delayed transaction 
// Read before the delayed transaction 


li: assume HB =L A atra = ; goto Ini; 
lı: assume HB =L Ap.a =L ; goto Ini; £55) 
lz1: £ := e; goto l2; 
lpi: r:= æ; goto lo; 
oe k j (6), Write after the delayed transaction 
// Read after the delayed transaction | HB ZLA ri toil 
1: assume L Ap.a=L ; goto lz2; 


lı: assume HB AL ; goto lz2; 


lz2: assume z.event’ AL ; assume false; (9) 
Ino: T= @; goto lyg; j 
lz2: assume x.event’ =L ; goto Iz3; 
lz3: assume x.eventI = st A p.hbh =L ; goto Ina; | 
3: © := e; goto H 
lz4: p-hbh := true; goto lo; (7) as 8 at 
Iga? x.event := st; goto lz5; (10) 
lz3: assume x.event =L ; goto lz5; | I es 
5: assume x.event LA p.hbh =L ; goto i 
lz5: v.event := ld; goto lo; (8 ~~ = i d 8 sik 
3: ph := true; goto lo; 11 
lz3: assume x.event AL V p.hbh AL ; goto lo; 76+ P E 2 (11) 
lz5: assume x.eventl =L V p.hbh ; goto l2; 


Fig. 4. Instrumentation of happens-before helpers. 


1 Programs with a bounded number of variables taking values from a bounded domain. 
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Corollary 1. Checking robustness of finite-state programs against snapshot 
isolation is PSPACE-complete when the number of processes is fixed and 
EXPSPACE-complete, otherwise. 


The instrumentation can be extended to SQL (select/update) queries where a 
statement may include expressions over a finite/infinite set of variables, e.g., by 
manipulating a set of flags x.event for each statement instead of only one. 


6 Proving Program Robustness 


As a more pragmatic alternative to the reduction in the previous section, we 
define an approximated method for proving robustness which is inspired by Lip- 
ton’s reduction theory [18]. 


Movers. Given an execution T = ev1-...-€Un of a program P under serializability 
(where each event ev; corresponds to executing an entire transaction), we say 
that the event ev; moves right (resp., left) in T if evy-...-euj—1- eVi41- eVi eVi: 
. -+ €Un (LESP., EV... €U;_g* EV; EVj_1° CUi41"---* Un) is also a valid execution of 
P, the process of ev; is different from the process of ev;41 (resp., evj_1), and both 
executions reach to the same end state on. For an execution 7, let instOf,(ev,) 
denote the transaction that generated the event ev;. A transaction t of a program 
P isa right (resp., left) mover if for all executions r of P under serializability, 
the event ev; with instOf(ev;) = t moves right (resp., left) in 7. 

If a transaction t is not a right mover, then there must exist an execution T of 
P under serializability and an event ev; of r with instOf(ev;) = t that does not 
move right. This implies that there must exist another evi+ı of r which caused 
ev; to not be a right mover. Since ev; and ev;,; do not commute, then this 
must be because of either a write-read, write-write, or a read-write dependency. 
If t = instOf(ev;41), we say that t is not a right mover because of t' and some 
dependency that is either write-read, write-write, or read-write. Notice that when 
t is not a right mover because of t then t is not a left mover because of t. 

We define Mwr as a binary relation between transactions such that (t,t) € 
Mwr when t is not a right mover because of t and a write-read dependency. We 
define the relations Mww and Mew corresponding to write-write and read-write 
dependencies in a similar way. 


Read/Write-free Transactions. Given a transaction t, we define t \ {r} asa 
variation of t where all the reads from shared variables are replaced with non- 
deterministic reads, i.e., (reg) := (var) statements are replaced with (reg) := x 
where x denotes non-deterministic choice. We also define t\ {w} as a variation of t 
where all the writes to shared variables in t are disabled. Intuitively, recalling the 
reduction to SC reachability in Sect. 5, t\{w} simulates the delay of a transaction 
by the Attacker, i.e., the writes are not made visible to other processes, and t\{r} 
approximates the commit of the delayed transaction which only applies a set of 
writes. 
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Commutativity Dependency Graph. Given a program P, we define the 
commutativity dependency graph as a graph where vertices represent transac- 
tions and their read/write-free variations. Two vertices which correspond to the 
original transactions in P are related by a program order edge, if they belong 
to the same process. The other edges in this graph represent the “non-mover” 
relations Mwr; Mww, and Mrw. 

Given a program P, we say that the commutativity dependency graph of P 
contains a non-mover cycle if there exist a set of transactions to,t1,...,tn of P 
such that the following hold: 


(a) (t§,t1) E Maw where t is the write-free variation of tọ and tı does not write 
to a variable that to writes to; 

(b) for all i € [1n], (ti, ti+1) E (PO U Mwr U Mww U Mrw), t; and lida do not 
write to a shared variable that to writes to; 

(c) (tn, to) E Mrw where tọ is the read-free variation of to and t,, does not write 
to a variable that to writes to. 


A non-mover cycle approximates an execution of the instrumentation defined 
in Sect. 5 in between the moment that the Attacker delays a transaction to (which 
here corresponds to the write-free variation tj) and the moment where to gets 
committed (the read-free variation 1). 

The following theorem shows that the acyclicity of the commutativity depen- 
dency graph of a program implies the robustness of the program. Actually, the 
notion of robustness in this theorem relies on a slightly different notion of trace 
where store-order and write-order dependencies take into account values, i.e., 
store-order relates only writes writing different values and the write-order relates 
a read to the oldest write (w.r.t. execution order) writing its value. This relax- 
ation helps in avoiding some harmless robustness violations due to for instance, 
two transactions writing the same value to some variable. 


Theorem 3. For a program P, if the commutativity dependency graph of P does 
not contain non-mover cycles, then P is robust. 


7 Experiments 


To test the applicability of our robustness checking algorithms, we have con- 
sidered a benchmark of 10 applications extracted from the literature related to 
weakly consistent databases in general. A first set of applications are open source 
projects that were implemented to be run over the Cassandra database, extracted 
from [11]. The second set of applications is composed of: TPC-C [24], an on-line 
transaction processing benchmark widely used in the database community, Small- 
Bank, a simplified representation of a banking application [2], FusionTicket, a 
movie ticketing application [16], Auction, an online auction application [6], and 
Courseware, a course registration service extracted from [14,19]. 
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Table 1. An overview of the analysis results. CDG stands for commutativity depen- 
dency graph. The columns PO and PT show the number of proof obligations and 
proof time in second, respectively. T stands for trivial when the application has only 
read-only transactions. 


Application #Transactions | Robustness | Reachability analysis CDG Analysis 
PO | PT PO | PT 
Auction 4 v 70 | 0.3 20 | 0.5 
Courseware 5 x 59 | 0.37 na na 
Fusion Ticket 4 v 72 | 0.3 34 0.5 
SmallBank 5 x 48 | 0.28 na na 
TPC-C 5 v 54 | 0.7 82 | 3.7 
Cassieq-Core 8 v 173 | 0.55 104 | 2.9 
Currency-Exchange | 6 v 88 | 0.35 26 | 3.5 
PlayList 14 v 99 | 4.63 236 | 7.3 
RoomStore 5 v 85 | 0.3 22 | 0.5 
Shopping-Cart 4 v 58 | 0.25 T T 


A first experiment concerns the reduction of robustness checking to SC reach- 
ability. For each application, we have constructed a client (i.e., a program com- 
posed of transactions defined within that application) with a fixed number of pro- 
cesses (at most 3) and a fixed number of transactions (between 3 and 7 transac- 
tions per process). We have encoded the instrumentation of this client, defined in 
Sect. 5, in the Boogie programming language [3] and used the Civl verifier [15] in 
order to check whether the assertions introduced by the instrumentation are vio- 
lated (which would represent a robustness violation). Note that since clients are 
of fixed size, this requires no additional assertions/invariants (it is an instance of 
bounded model checking). The results are reported in Table 1. We have found two 
of the applications, Courseware and SmallBank, to not be robust against snapshot 
isolation. The violation in Courseware is caused by transactions RemoveCourse 
and EnrollStudent that execute concurrently, RemoveCourse removing a course 
that has no registered student and EnrollStudent registering a new student to the 
same course. We get an invalid state where a student is registered for a course that 
was removed. SmallBank’s violation contains transactions Balance, TransactSav- 
ing, and WriteCheck. One process executes WriteCheck where it withdraws an 
amount from the checking account after checking that the sum of the checking and 
savings accounts is bigger than this amount. Concurrently, a second process exe- 
cutes TransactSaving where it withdraws an amount from the saving account after 
checking that it is smaller than the amount in the savings account. Afterwards, the 
second process checks the contents of both the checking and saving accounts. We 
get an invalid state where the sum of the checking and savings accounts is negative. 

Since in the first experiment we consider fixed clients, the lack of assertion vio- 
lations doesn’t imply that the application is robust (this instantiation of our reduc- 
tion can only be used to reveal robustness violations). Thus, a second experiment 
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concerns the robustness proof method based on commutativity dependency graphs 
(Sect. 6). For the applications that were not identified as non-robust by the pre- 
vious method, we have used Civl to construct their commutativity dependency 
graphs, i.e., identify the “non-mover” relations Mwr, Mww, and Mew (Civl allows 
to check whether some code fragment is a left/right mover). In all cases, the graph 
didn’t contain non-mover cycles, which allows to conclude that the applications 
are robust. 

The experiments show that our results can be used for finding violations and 
proving robustness, and that they apply to a large set of interesting examples. 
Note that the reduction to SC and the proof method based on commutativity 
dependency graphs are valid for programs with SQL (select /update) queries. 


8 Related Work 


Decidability and complexity of robustness has been investigated in the context of 
relaxed memory models such as TSO and Power [7,9,13]. Our work borrows some 
high-level principles from [7] which addresses the robustness against TSO. We 
reuse the high-level methodology of characterizing minimal violations according 
to some measure and defining reductions to SC reachability using a program 
instrumentation. Instantiating this methodology in our context is however very 
different, several fundamental differences being: 


— SI and TSO admit different sets of relaxations and SI is a model of trans- 
actional databases. 

— We use a different notion of measure: the measure in [7] counts the number of 
events between a write issue and a write commit while our notion of measure 
counts the number of delayed transactions. This is a first reason for which 
the proof techniques in [7] don’t extend to our context. 

— Transactions induce more complex traces: two transactions might be related 
by several dependency relations since each transaction may contain multi- 
ple reads and writes to different locations. In TSO, each action is a read 
or a write to some location, and two events are related by a single depen- 
dency relation. Also, the number of dependencies between two transactions 
depends on the execution since the set of reads/writes in a transaction 
evolves dynamically. 


Other works [9,13] define decision procedures which are based on the theory of 
regular languages and do not extend to infinite-state programs like in our case. 

As far as we know, our work provides 

š A pi: p2: 
the first results concerning the decid- +1: [if (> y) p 1 [if (y> x) 
ability and the complexity of robustness he oo t2 = i 
: k ; x =y] y =x] 

checking in the context of transactions. 
The existing work on the verification 
of robustness for transactional programs 
provide either over- or under-approximate analyses. Our commutativity depen- 
dency graphs are similar to the static dependency graphs used in [6, 10-12], 


Fig. 5. A robust program. 
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but they are more precise, i.e., reducing the number of false alarms. The static 
dependency graphs record happens-before dependencies between transactions 
based on a syntactic approximation of the variables accessed by a transaction. 
For example, our techniques are able to prove that the program in Fig.5 is 
robust, while this is not possible using static dependency graphs. The latter 
would contain a dependency from transaction tı to t2 and one from təz to tı just 
because syntactically, each of the two transactions reads both variables and may 
write to one of them. Our dependency graphs take into account the semantics 
of these transactions and do not include this happens-before cycle. Other over- 
and under-approximate analyses have been proposed in [20]. They are based 
on encoding executions into first order logic, bounded-model checking for the 
under-approximate analysis, and a sound check for proving a cut-off bound on 
the size of the happens-before cycles possible in the executions of a program, for 
the over-approximate analysis. The latter is strictly less precise than our method 
based on commutativity dependency graphs. For instance, extending the TPC-C 
application with additional transactions will make the method in [20] fail while 
our method will succeed in proving robustness (the three transactions are for 
adding a new product, adding a new warehouse based on the number of cus- 
tomers and warehouses, and adding a new customer, respectively). 

Finally, the idea of using Lipton’s reduction theory for checking robustness 
has been also used in the context of the TSO memory model [8], but the tech- 
niques are completely different, e.g., the TSO technique considers each update 
in isolation and doesn’t consider non-mover cycles like in our commutativity 
dependency graphs. 
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Abstract. We show how to verify that large data center networks sat- 
isfy key properties such as all-pairs reachability under a bounded num- 
ber of faults. To scale the analysis, we develop algorithms that identify 
network symmetries and compute small abstract networks from large 
concrete ones. Using counter-example guided abstraction refinement, we 
successively refine the computed abstractions until the given property 
may be verified. The soundness of our approach relies on a novel notion 
of network approximation: routing paths in the concrete network are not 
precisely simulated by those in the abstract network but are guaranteed 
to be “at least as good.” We implement our algorithms in a tool called 
Origami and use them to verify reachability under faults for standard 
data center topologies. We find that Origami computes abstract net- 
works with 1-3 orders of magnitude fewer edges, which makes it possible 
to verify large networks that are out of reach of existing techniques. 


1 Introduction 


Most networks decide how to route packets from point A to B by executing 
one or more distributed routing protocols such as the Border Gateway Protocol 
(BGP) and Open Shortest Path First (OSPF). To achieve end-to-end policy 
objectives related to cost, load balancing, security, etc., network operators author 
configurations for each router. These configurations control various aspects of the 
route computation such as filtering and ranking route information received from 
neighbors, information injection from one protocol to another, and so on. 
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This flexibility, however, comes at a cost: Configuring individual routers to 
enforce the desired policies of the distributed system is complex and error- 
prone [15,21]. The problem of configuration is further compounded by three 
challenges. The first is network scale. Large networks such as those of cloud 
providers can consist of millions of lines of configuration spread across thou- 
sands of devices. The second is that operators must account for the interaction 
with external neighbors who may sent arbitrary routing messages. Finally one 
has to deal with failures. Hardware failures are common [14] and lead to a com- 
binatorial explosion of different possible network behaviors. 

To combat the complexity of distributed routing configurations, researchers 
have suggested a wide range of network verification [2,13,25] and simulation 
[11,12,23] techniques. These techniques are effective on small and medium-sized 
networks, but they cannot analyze data centers with 1000s of routers and all 
their possible failures. To enable scalable analyses, it seems necessary to exploit 
the symmetries that exist in most large real networks. Indeed, other researchers 
have exploited symmetries to scale verification in the past [3,22]. However, it 
has never been possible to account for failures, as they introduce asymmetries 
that change routing behaviors in unpredictable ways. 

To address this challenge, we develop a new algorithm for verifying reacha- 
bility in networks in the presence of faults, based on the idea of counterexample- 
guided abstraction refinement (CEGAR) [5]. The algorithm starts by factoring 
out symmetries using techniques developed in prior work [3] and then attempts 
verification of the abstract network using an SMT solver. If verification succeeds, 
we are done. However, if verification fails, we examine the counter-example to 
decide whether we have a true failure or we must refine the network further and 
attempt verification anew. By focusing on reachability, the refinement procedure 
can be accelerated by using efficient graph algorithms, such as min cut, to rule 
out invalid abstractions in the middle of the CEGAR loop. 

We prove the correctness of our algorithm using a new theory of faulty net- 
works that accounts for the impact of all combinations of k failures. Our key 
insight is that, while routes computed in the abstract network may not simulate 
those of the concrete network exactly, under the right conditions they are guar- 
anteed to approximate them. The approximation relation between concrete and 
abstract networks suffices to verify key properties such as reachability. 

We implemented our algorithms in a tool called Origami and measured their 
performance on common data center network topologies. We find that Origami 
computes abstract networks with 1-3 orders of magnitude fewer edges. This 
reduction speeds verification dramatically and enables verification of networks 
that are out of reach of current state-of-the-art tools [2]. 


2 Key Ideas 


The goal of Origami is to speed up network verification in the presence of faults, 
and it does so by computing small, abstract networks with similar behavior to 
a given concrete network. 
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Fig. 1. All graph edges shown correspond to edges in the network topology, and we 
draw edges as directed to denote the direction of forwarding eventually determined for 
each node by the distributed routing protocols for a fixed destination d. In (a) nodes use 
shortest path routing to route to the destination d. (b) shows a compressed network 
that precisely captures the forwarding behavior of (a). (c) shows how forwarding is 
impacted by a link failure, shown as a red line. (d) shows a compressed network that 
is sound approximation of the original network for any single link failure. (Color figure 
online) 


As a first approximation, one can view a network as a directed graph cap- 
turing the physical topology, and its routing solution as a subgraph where the 
remaining edges denote the forwarding decision at each node for some fixed des- 
tination. In the absence of faults, given a concrete and abstract network, one 
can define a natural notion of similarity as a graph homomorphism: assigning 
each concrete node a corresponding abstract node such that, for any solution 
to the routing problem, the concrete node forwards “in the same direction” as 
the corresponding abstract node. For example, the concrete network in Fig. 1a 
is related to its abstract counterpart in Fig. 1b according to the node colors. 

Unfortunately, we run into two significant problems when defining abstrac- 
tions in this manner in the presence of faults. First, the concrete nodes of Fig. la 
have at least 2 disjoint paths to the destination whereas abstract nodes of Fig. 1b 
have just one path to the destination, so the abstract network does not preserve 
the desired fault tolerance properties. Second, consider Fig. 1c, which illustrates 
how the routing decisions change when a failure occurs. Here, the nodes (bı in 
particular) no longer route “in the same direction” as the original network or 
its abstraction. Hence the invariant connecting concrete and abstract networks 
is violated. 


Lossy Compression. To achieve compression given a bounded number of link 
failures, we relax the notion of similarity between concrete and abstract nodes: A 
node in the abstract network may merely approximate the behavior of concrete 
nodes. This makes it possible to compress nodes that, in the presence of fail- 
ures, may route differently. In general, when we fail a single link in the abstract 
network, we are over-approximating the failures in the concrete network by fail- 
ing multiple concrete links, possibly more than desired. Nevertheless, the paths 
taken in the concrete network can only deviate so much from the paths found in 
the abstract network: 
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Property 1. If a node has a route to the destination in the presence of k link 
failures then it has a route that is “at least as good” (as prescribed by the routing 
protocol) in the presence of k’ link failures for k’ < k. 


This relation suffices to verify important network reliability properties, such as 
reachability, in the presence of faults. Just as importantly, it allows us to achieve 
effective network compression to scale verification. 

Revisiting our example, consider the new abstract network of Fig. 1d. When 
the link between bis and d has failed, bis still captures the behavior of bı pre- 
cisely. However, bp has a better (in this case better means shorter) path to d. 
Despite this difference, if the operator’s goal was to prove reachability to the 
destination under any single fault, then this abstract network suffices. 


From Specification to Algorithm. It is not too difficult to find abstract 
networks that approximate a concrete network; the challenge is finding a valid 
abstract network that is small enough to make verification feasible and yet large 
enough to include sufficiently many paths to verify the given fault tolerance 
property. Rather than attempting to compute a single abstract network with 
the right properties all in one shot, we search the space of abstract networks 
using an algorithm based on counter-example guided abstraction refinement [5]. 

The CEGAR algorithm begins by computing the smallest possible valid 
abstract network. In the example above, this corresponds to the original com- 
pressed network in Fig. 1b, which faithfully approximates the original network 
when there are no link failures. However, if we try to verify reachability in the 
presence of a single fault, we will conclude that nodes $ and @ have no route to 
the destination when the link between b and d fails. The counterexample due to 
this failure could of course be spurious (and indeed it is). Fortunately, we can 
easily distinguish whether such a failure is due to lack of connectivity or an arti- 
fact of over-abstracting, by calculating the number of corresponding concrete 
failures. In this example a failure on the link (b, d) corresponds to 3 concrete 
failures. Since we are interested in verifying reachability for a single failure this 
cannot constitute an actual counterexample. 

The next step is to refine our abstraction by splitting some of the abstract 
nodes. The idea is to use the counterexample from the previous iteration to 
split the abstract network in a way that avoids giving rise to the same spurious 
counterexample in the next iteration (Sect.5). Doing so results in the somewhat 
larger network of Fig. 1d. A second verification pass over this larger network 
takes longer, but succeeds. 


3 The Network Model 


Though there are a wide variety of routing protocols in use today, they share 
a lot in common. Griffin et al. [16] showed that protocols like BGP and others 
solve instances of the stable paths problem, a generalization of the shortest paths 
problem, and Sobrinho [24] demonstrated their semantics and properties can be 
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modelled using routing algebras. We extend these foundations by defining stable 
paths problems with faults (SPPFs), an extension of the classic Stable Paths 
Problem that admits the possibility of a bounded number of link failures. In 
later sections, we use this network model to develop generic network compression 
algorithms and reason about their correctness. 


Stable Path Problems with Faults (SPPFs): An SPPF is an instance of the 
stable paths problem with faults. Informally, each instance defines the routing 
behavior of an operational network. The definition includes both the network 
topology as well as the routing policy. The policy specifies the way routing mes- 
sages are transformed as they travel along links and through the user-configured 
import and export filters/transformers of the devices, and also how the preferred 
routes are chosen at a given device. In our formulation, each problem instance 
also incorporates a specification of the possible failures and their impact on the 
routing solutions. 
Formally, an SPPF is a tuple with six components: 


1. A graph G = (V, E) denoting the network topology. 

2. A set of “attributes” (i.e., routing messages) Aj. = AU {co} that may be 
exchanged between devices. The symbol oo represents the absence of a route. 

3. A destination d € V and its initial route announcement ag € A. For simplicity, 
each SPPF has exactly one destination (d). (To model a network with many 
destinations, one would use a set of SPPFs.) 

4. A partial order < C Ay x Ago ranks attributes. If a < b then we say route a 
is preferred over route b. Any route a € A is preferred to no route (a < oo). 

5. A function trans: E > A,, > Ag that denotes how messages are processed 
across edges. This function models the route maps and filters that transform 
route announcements as they enter or leave routers. 

6. A bound k on the maximum number of link failures that may occur. 


Examples: By choosing an appropriate set of routing attributes, a preference 
relation and a transfer function, one can model the semantics of commonly used 
routing protocols. For instance, the Routing Information Protocol (RIP) is a 
simple shortest paths protocol. It can be modelled by an SPPF where (1) the set 
of attributes A is the set of integers between 0 and 15 (i.e., the set of permitted 
path lengths), (2) the preference relation is integer inequality so shorter paths 
are preferred, and (3) the transfer function increments the received attribute by 
1 or drops the route if it exceeds the maximum hop count of 15: 


oe) if a>15 
trans(e, a) = : 
a+1 otherwise 
Going beyond simple shortest paths, BGP is a complex, policy-driven proto- 
col that drives the Internet, and increasingly, data centers [18]. Operators often 
choose BGP due to its high expressiveness. We can model a version of BGP (sim- 
plified for presentation) using messages consisting of triples (LP, Comm, Path) 
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where LP is an integer-valued local preference, Comm is a set of community val- 
ues (which are essentially string tags) and Path is a list of nodes, representing 
the path a routing message has traversed. The transfer function always adds the 
current device to the Path (or drops the message if a loop is detected) and will 
modify the LP and Comm components of the attribute according to the device 
configuration. For instance, one device may attach a community tag to a route 
and another device may filter or modify routes that have the tag attached. The 
protocol semantics dictates the preference relation (preferring routes with higher 
local preference first, and shorter paths second). A more complete BGP model 
is not fundamentally harder to model—it simply has additional attribute fields 
and more complex transfer and preference relations [20]. 


SPPF Solutions: In a network, routers will repeatedly exchange messages, 
applying their transfer functions to neighbor routes and selecting a current best 
route based on the preference relation, until the network reaches a fixpoint (sta- 
ble state). Interestingly, Griffin et al. [16] showed that all routing solutions can 
be described via a set of local stability constraints. We exploit this insight to 
define a series of logical constraints that capture all possible routing behaviors 
in a setting that includes link failures. More specifically, we define a solution 
(aka, stable state) S of an SPPF to be a pair (£, F) of a labelling £ and a failure 
scenario F. The labelling £ is an assignment of the final attributes to nodes in 
the network. If an attribute a is assigned to node v, we say that node has selected 
(or prefers) that attribute over other attributes available to it. The chosen route 
also determines packet forwarding. If a node X selects a route from neighbor Y, 
then X will forward packets to Y. The failure scenario F is an assignment of 0 
(has not failed) or 1 (has failed) to each edge in the network. 

A solution S = (£, F} to an SPPF = (G, A, aq, <, trans, k) is a stable state 
satisfying the following conditions: 


da u=d 
L(u) = 4 œ choicess(u) = Ø 
min.({a | (e,a) € choicess(u)}) choicess(u) 4 Ø 
subject toS— Fle) <k 
eck 


where the choices from the neighbors of node u are defined as: 
choicess(u) = {(e,a) | e = (u,v), a=trans(e, L(v)), a Æ œ, Fle) = 0} 


The constraints require that every node has selected the best attribute 
(according to its preference relation) amongst those available from its neigh- 
bors. The destination’s label must always be the initial attribute ag. For ver- 
ification, this attribute (or parts of it) may be symbolic, which helps model 
potentially unknown routing announcements from peers outside our network. 
For other nodes u, the selected attribute a is the minimal attribute from the 
choices available to u. Intuitively, to find the choices available to u, we consider 
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the attributes b chosen by neighbors v of u. Then, if the edge between v and 
u is not failed, we push b along that edge, modifying it according to the trans 
function. Finally, failure scenarios are constrained so that the sum of the failures 
is at most k. 


4 Network Approximation Theory 


Given a concrete SPPF and an abstract SPPF, a network abstraction is a pair of 
functions (f,h) that relate the two. The topology abstraction f : V — V maps 
each node in the concrete network to a node in the abstract network, while the 
attribute abstraction h : Aj. — Ao. maps a concrete attribute to an abstract 
attribute. The latter allows us to relate networks running protocols where nodes 
may appear in the attributes (e.g. as in the Path component of BGP). 

The goal of Origami is to compute compact SPPFs that may be used for 
verification. These compact SPPFs must be closely related to their concrete 
counterparts. Otherwise, properties verified on the compact SPPF will not be 
true of their concrete counterpart. Section 4.1 defines label approximation, which 
provides an intuitive, high-level, semantic relationship between abstract and 
concrete networks. We also explain some of the consequences of this defini- 
tion and its limitations. Unfortunately, while this broad definition serves as an 
important theoretical objective, it is difficult to use directly in an efficient algo- 
rithm. Section 4.2 continues our development by explaining two well-formedness 
requirements of network policies that play a key role in establishing label approx- 
imation indirectly. Finally, Sect. 4.3 defines effective SPPF approximation for 
well-formed SPPFs. This definition is more conservative than label approxima- 
tion, but has the advantage that it is easier to work with algorithmically and, 
moreover, it implies label approximation. 


4.1 Label Approximation 


Intuitively, we say the abstract SPPF label-approximates the concrete SPPF 
when SPPF has at least as good a route at every node as SPPF does. 


Definition 1 (Label Approximation). Consider any solutions S to SPPF 


and § to SPPF and their respective labelling components £ and L. We say SPPF 
label-approzimates SPPF when Vu E V. h(L(u)) < L(f(u)). 


If we can establish a label approximation relation between a concrete and an 
abstract network, we can typically verify a number of properties of the abstract 
network and be sure they hold of the concrete network. However, the details of 
exactly which properties we can verify depend on the specifics of the preference 
relation (<). For example, in an OSPF network, preference is determined by 
weighted path length. Therefore, if we know an abstract node has a path of 
weighted length n, we know that its concrete counterparts have paths of weighted 
length of at most n. More importantly, since “no route” is the worst route, we 
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know that if a node has any route to the destination in the abstract network, so 
do its concrete counterparts. 


Limitations. Some properties are beyond the scope of our tool (independent of 
the preference relation). For example, our model cannot reason about quantita- 
tive properties such as bandwidth, probability of congestion, or latency. 


4.2 Well-Formed SPPFs 


Not all SPPFs are well-behaved. For example, some never converge and oth- 
ers do not provide sensible models of any real network. To avoid dealing with 
such poorly-behaved models, we demand henceforth that all SPPFs are well- 
formed. Well-formedness entails that an SPPF is strictly monotonic and isotonic: 
Va,e. a#~ => a <x trans(e, a) strict monotonicity 
Va,b,e. axb => trans(e,a) < trans(e, b) isotonicity 


Fig. 2. Concrete network (left) and its corresponding abstraction (right). Nodes c1, c2 
prefer to route through bı (resp. b2), or g over a. Node bı (resp. b2) drops routing 
messages that have traversed bz (resp. bi). Red lines indicate a failed link. Dotted lines 
show a topologically available but unused link. A purple arrow show a route unusable 
by traffic from b1. (Color figure online) 


Monotonicity and isotonicity properties are often cited [7,8] as desirable prop- 
erties of routing policies because they guarantee network convergence and pre- 
vent persistent oscillation. In practice too, prior studies have revealed that almost 
all real network configurations have these properties [13,19]. 

In our case, these properties help establish additional invariants that tie 
the routing behavior of concrete and abstract networks together. To gain some 
intuition as to why, consider the networks of Fig. 2. The concrete network on 
the left runs BGP with the routing policy that node cı (and c2) prefers to route 
through node g instead of a, and that 6; drops announcements coming from 
bə. In this scenario, the similarly configured abstract node bı can reach the 
destination—it simply takes a route that happens to be less preferred by ĉ12 
than it would if there had been no failure. However, in the concrete analogue, 
bı, is unable to reach the destination because cı only sends it the route through 
bg, which it cannot use. In this case, the concrete network has more topological 
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paths than the abstract network, but, counterintuitively, due to the network’s 
routing policy, this turns out to be a disadvantage. Hence having more paths 
does not necessarily make nodes more accessible. As a consequence, in general, 
abstract networks cannot soundly overapproximate the number of failures in a 
concrete network—an important property for the soundness of our theory. 

The underlying issue here is that the networks of Fig. 2 are not isotonic: sup- 
pose £’(c;) is the route from c; to the destination through node a, we have that 
L(c1) < L' (c1) but since the transfer function over (b1, c,) drops routes that have 
traversed node b2, we have that trans((b1,¢1),£(c1)) A trans((b1, c1), £’(c1)). 
Notice that £’(c1) is essentially the route that the abstract network uses i.e. 
A(L'(e1)) = L(é12), hence the formula above implies that h(L(b1)) Z L(b12) 
which violates the notion of label approximation. Fortunately, if a network is 
strictly monotonic and isotonic, such situations never arise. Moreover, we check 
these properties via an SMT solver using a local and efficient test. 


4.3 Effective SPPF Approximation 


We seek abstract networks that label-approximate given concrete networks. 
Unfortunately, to directly check that a particular abstract network label approx- 
imates a concrete network one must effectively compute their solutions. Doing 
so would defeat the entire purpose of abstraction, which seeks to analyze large 
concrete networks without the expense of computing their solutions directly. 

In order to turn approximation into a useful computational tool, we define 
effective approximation, a set of simple conditions on the abstraction functions f 
and h that are local and can be checked efficiently. When true those conditions 
imply label approximation. Intuitively effective approximations impose three 
main restrictions on the abstraction functions: 


1. The topology abstraction conforms to the Vi—abstraction condition; this 
requires that there is an abstract edge (w,%) iff for every concrete node u 
such that f(u) = there is some node v such that f(v) =v and (u,v) € E. 

2. The abstraction preserves the rank of attributes (rank-equivalence): 


Va,b.a<b => h(a) 2 h(b) 
3. The transfer function and the abstraction functions commute (trans- 


equivalence): 


ma 


Ve,a. h(trans(e,a)) = trans( f (e), h(a)) 


We prove that when these conditions hold, we can approximate any solution of 
the concrete network with a solution of the abstract network. 


Theorem 1. Given a well-formed SPPF and its effective approximation SPPF, 
for any solution S € SPPF there exists a solution S € SPPF, such that their 
labelling functions are label approximate. 
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5 The Verification Procedure 


The first step of verification is to compute a small abstract network that satis- 
fies our SPPF effective approximation conditions. We do so by grouping network 
nodes and edges with equivalent policy and checking the forall-exists topological 
condition, using an algorithm reminiscent of earlier work [3]. Typically, how- 
ever, this minimal abstraction will not contain enough paths to prove any fault- 
tolerance property. To identify a finer abstraction for which we can prove a 
fault-tolerance property we repeatedly: 


1. Search the set of candidate refinements for the smallest plausible abstraction. 

2. If the candidate abstraction satisfies the desired property, terminate the pro- 
cedure. (We have successfully verified our concrete network.) 

3. If not, examine whether the returned counterexample is an actual counterex- 
ample. We do so, by computing the number of concrete failures and check 
that it does not exceed the desired bound of link failures. (If so, we have 
found a property violation.) 

4. If not, use the counterexample to learn how to expand the abstract network 
into a larger abstraction and repeat. 


(c) (a) 


Fig. 3. Eight nodes in (a) are represented using two nodes in the abstract network (b). 
Pictures (c) and (d) show two possible ways to refine the abstract network (b). 


Both the search for plausible candidates and the way we learn a new abstrac- 
tion to continue the counterexample-guided loop are explained below. 


5.1 Searching for Plausible Candidates 


Though we might know an abstraction is not sufficient to verify a given fault 
tolerance property, there are many possible refinements. Consider, for example, 
Fig. 3(a) presents a simple concrete network that will tolerate a single link fail- 
ure, and Fig. 3(b) presents an initial abstraction. The initial abstraction will 
not tolerate any link failure, so we must refine the network. To do so, we 
choose an abstract node to divide into two abstract nodes for the next iteration. 
We must also decide which concrete nodes correspond to each abstract node. 
For example, in Fig. 3(c), node â has been split into â13 and â24. The subscripts 
indicate the assignment of concrete nodes to abstract ones. 
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A significant complication is that once we have generated a new abstraction, 
we must check that it continues to satisfy the effective approximation conditions, 
and if not, we must do more work. Figure 3(c) satisfies those conditions, but if 
we were to split â into Gj2 and 34 rather than âı3 and â24, the forall-exists 
condition would be violated—some of the concrete nodes associated with b are 
connected to the concrete nodes in â12 but not to the ones in @34 and vice versa. 
To repair the violation of the forall-exists condition, we need to split additional 
nodes. In this case, the b node, giving rise to diagram Fig. 3(d). 

Overall, the process of splitting nodes and then recursively splitting further 
nodes to repair the forall-exists condition generates many possible candidate 
abstractions to consider. A key question is which candidate should we select to 
proceed with the abstraction refinement algorithm? 

One consideration is size: A smaller abstraction avoids taxing the verifier, 
which is the ultimate goal. However, there are many small abstractions that we 
can quickly dismiss. Technically, we say an abstraction is plausible if all nodes 
of interest have at least k + 1 paths to the destination. Implausible abstrac- 
tions cause nodes to become unreachable with k failures. To check whether an 
abstraction is plausible, we compute the min-cut of the graph. Figure 3(d) is an 
example of an implausible abstraction that arose after a poorly-chosen split of 
node â. In this case, no node has 2 or more paths to the destination and hence 
they might not be able to reach the destination when there is a failure. 

Clearly verification using an implausible abstraction will fail. Instead of con- 
sidering such abstractions as candidates for running verification on, the refine- 
ment algorithm tries refining them further. A key decision the algorithm needs to 
make when refining an abstraction is which abstract node to split. For instance, 
the optimal refinement of Fig. 3(b) is Fig. 3(c). If we were to split node b instead 
of â we would end up with a sub-optimal (in terms of size) abstraction. Intu- 
itively, splitting a node that lies on the min-cut and can reach the destination 
(e.g. â) will increase the number of paths that its neighbors on the unreachable 
part of the min-cut (e.g. 6) can use to reach the destination. 

To summarize, the search for new candidate abstractions involves (1) splitting 
nodes in the initial abstraction, (2) repairing the abstraction to ensure the forall- 
exists condition holds, (3) checking that the generated abstraction is plausible, 
and if not, (4) splitting additional nodes on the min cut. This iterative process 
will often generate many candidates. The breadth parameter of the search bounds 
the total number of plausible candidates we will generate in between verification 
efforts. Of all the plausible candidates generated, we choose the smallest one to 
verify using the SMT solver. 


5.2 Learning from Counterexamples 


Any nodes of an abstraction that have a min cut of less than k+1 definitely can- 
not tolerate k faults. If an abstraction is plausible, it satisfies a necessary condi- 
tion for source-destination connectivity, but not a sufficient one—misconfigured 
routing policy can still cause nodes to be unreachable by modifying and/or subse- 
quently dropping routing messages. For instance, the abstract network of Fig. 3c 
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is plausible for one failure, but if ts routing policy blocks routes of either G13 or 
@24 then the abstract network will not be 1-fault tolerant. Indeed, it is the com- 
plexity of routing policy that necessitates a heavy-weight verification procedure 
in the first place, rather than a simpler graph algorithm alone. 

In a plausible abstraction, if the verifier computes a solution to the network 
that violates the desired fault-tolerance property, some node could not reach the 
destination because one or more of their paths to the destination could not be used 
to route traffic. We use the generated counterexample to learn edges that could 
not be used to route traffic due to the policy on them. To do so, we inspect the 
computed solution to find nodes & that (1) lack a route to the destination (i.e. 
L(@) = ov), (2) have a neighbor @ that has a valid route to the destination, and 
(3) the link between & and 7 is not failed. These conditions imply the absence of 
a valid route to the destination not because link failures disabled all paths to the 
destination, but because the network policy dropped some routes. For example, in 
picture Fig. 3c, consider the case where b does not advertise routes from @)3 and 
@24; if the link between a3 and d fails, then @ 3 has no route the destination and 
we learn that the edge (b,G@13) cannot be used. In fact, since @3 and G12 belonged 
to the same abstract group @ before we split them, their routing policies are equal 
modulo the abstraction function_by trans-equivalence. Hence, we can infer that in 
a symmetric scenario, the link (b, @24) will also be unusable. 

Given a set of unuseable edges, learned from a counterexample, we restrict the 
min cut problems that define the plausible abstractions, by disallowing the use of 
those edges. Essentially, we enrich the refinement algorithm’s topological based 
analysis (based on min-cut) with knowledge about the policy; the algorithm will 
have to generate abstractions that are plausible without using those edges. With 
those edges disabled, the refinement process continues as before. 


6 Implementation 


Origami uses the Batfish network analysis framework [12] to parse network con- 
figurations, and then translate them into a pure functional intermediate repre- 
sentation (IR) designed for network verification. This IR represents the structure 
of routing messages and the semantics of transfer and preference relations using 
standard functional data structures. 

The translation generates a separate functional program for each destina- 
tion subnet. In other words, if a network has 100 top-of-rack switches and each 
such switch announces the subnets for 30 adjacent hosts, then Origami gener- 
ates 100 functional programs (i.e. problem instances). We separately apply our 
algorithms to each problem instance, converting the functional program to an 
SMT formula when necessary according to the algorithm described earlier. Since 
vendor routing configuration languages have limited expressive power (e.g., no 
loops or recursion) the translation requires no user-provided invariants. We use 
Z3 [10] to determine satisfiability of the SMT problems. Solving the problems 
separately (and in parallel) provides a speedup over solving the routing problem 
for all destinations simultaneously: The individual problems are specialized to a 
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particular destination. By doing so, opportunities for optimizations that reduce 
the problem size, such as dead code elimination, arise. 


Optimizing Refinement: During the course of implementing Origami, we dis- 
covered a number of optimizations to the refinement phase. 


— If the min-cut between the destination and a vertex u is less than or equal 
to the desired number of disjoint paths, then we do not need to compute 
another min-cut for the nodes in the unreachable portion of vertices T; we 
know nodes in T can be disconnected from the destination. This significantly 
reduces the number of min-cut computations. 

— We stop exploring abstractions that are larger in size than the smallest plau- 
sible abstraction computed since the last invocation of the SMT solver. 

— We bias our refinement process to explore the smallest abstractions first. 
When combined the previous optimization, this prunes our search space from 
some abstractions that were unnecessary large. 


Minimizing Counterexamples: When the SMT solver returns a counterex- 
ample, it often uses the maximum number of failures. This is not surprising as 
maximizing failures simplifies the SMT problem. Unfortunately, it also confounds 
our analysis to determine whether a counterexample is real or spurious. 


Topo} Con V/E|Fail|Abs V/E| Ratio |Abs Time|SMT Calls|SMT Time 
1 9/20 55.5/400 0.1 1 0.1 
3 | 40/192 | 12.5/41.67 1.0 2 7.6 
FT20 500/8000 5 | 96/720 5.20/11.1 2.5 2 248 
10 | 59/440 | 8.48/18.18 0.9 - = 
1 12/28  |166.7/2285.7 0.1 1 0.1 
FT40/2000/64000} 3 | 45/220 | 44.4/290.9 33 2 12.3 
5 | 109/880 | 18.34/72.72 762.3 2 184.1 
1 13/32 | 153.8/2000 0.2 1 0.1 
SP40 |2000/64000] 3 | 39/176 | 51.3/363.6 30.3 1 2 
5 | 79/522 | 25.3/122.6 372.2 1 22 
1 20/66 37.2/164.8 0.1 3 1 
FbFT| 744/10880} 3 | 57/360 | 13.05/30.22 1 4 18.3 
5 | 93/684 8/15.9 408.9 - - 


Fig. 4. Compression results. Topo: the network topology. Con V/E: Number of 
nodes/edges of concrete network. Fail: Number of failures. Abs V/E: Number of 
nodes/edges of the best abstraction. Ratio: Compression ratio (nodes/edges). Abs 
Time: Time taken to find abstractions (sec.). SMT Calls: Number of calls to the 
SMT solver. SMT Time: Time taken by the SMT solver (sec.). 


To mitigate the effect of this problem, we could ask the solver to minimize 
the returned counterexample, returning a counterexample that corresponds to 
the fewest concrete link failures. We could do so by providing the solver with 
additional constraints specifying the number of concrete links that correspond 
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to each abstract link and then asking the solver to return a counterexample that 
minimizes this sum of concrete failures. Of course, doing so requires we solve a 
more expensive optimization problem. Instead, given an initial (possibly spuri- 
ous counter-example), we simple ask the solver to find a new counterexample 
that (additionally) satisfies this constraint. If it succeeds, we have found a real 
counterexample. If it fails, we use it to refine our abstraction. 


7 Evaluation 


We evaluate Origami on a collection of synthetic data center networks that are 
using BGP to implement shortest-paths routing policies over common industrial 
datacenter topologies. Data centers are good fit for our algorithms as they can 
be very large but are highly symmetrical and designed for fault tolerance. Data 
center topologies (often called fattree topologies) are typically organized in lay- 
ers, with each layer containing many routers. Each router in a layer is connected 
to a number of routers in the layer above (and below) it. The precise number of 
neighbors to which a router is connected, and the pattern of said connections, 
is part of the topology definition. We focus on two common topologies: fattree 
topologies used at Google (labelled FT20, FT40 and SP40 below) and a different 
fattree used at Facebook (labelled FB12). These are relatively large data center 
topologies ranging from 500 to 2000 nodes and 8000 to 64000 edges. 

SP40 uses a pure shortest paths routing policy. For other experiments (FT20, 
FT40, FB12), we augment shortest paths with additional policy that selectively 
drops routing announcements, for example disabling “valley routing” in various 
places which allows up-down-up-down routes through the data centers instead 
of just up-down routes. The pure shortest paths policy represents a best-case 
scenario for our technology as it gives rise to perfect symmetry and makes our 
heuristics especially effective. By adding variations in routing policy, we provide 
a greater challenge for our tool. 

Experiments were done on a Mac with a 4GHz i7 CPU and 16GB memory. 


7.1 Compression Results 


Figure 4 shows the level of compression achieved, along with the required time 
for compression and verification. In most cases, we achieve a high compression 
ratio especially in terms of links. This drastically reduces the possible failure 
combinations for the underlying verification process. The cases of 10 link fail- 
ures on FT20 and 5 link failures on FbFT demonstrate another aspect of our 
algorithm. Both topologies cannot sustain that many link failures, i.e. some con- 
crete nodes have less than 10 (resp. 5) neighbors. We can determine this as we 
refine the abstraction; there are (abstract) nodes that do not satisfy the min 
cut requirement and we cannot refine them further. This constitutes an actual 
counterexample and explains why the abstraction of FT20 for 10 link failures is 
smaller than the one for 5 link failures. Importantly, we did not use the SMT 
solver to find this counterexample. Likewise, we did not need to run a min cut on 
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the much larger concrete topology. Intuitively, the rest of the network remained 
abstract, while the part that led to the counterexample became fully concrete. 


7.2 Verification Performance 


The verification time of Origami is dominated by abstraction time and SMT 
time, which can be seen in Fig.4. In practice, there is also some time taken 
to parse and pre-process the configurations but it is negligible. The abstraction 
time is highly dependent on the size of the network and the abstraction search 
breadth used. In this case, the breadth was set to 25, a relatively high value. 

While the verification time for a high number of link failures is not negligible, 
we found that verification without abstraction is essentially impossible. We used 
Minesweeper [2], the state-of-the-art SMT-based network verifier, to verify the 
same fault tolerance properties and it was unable to solve any of our queries. 
This is not surprising, as SMT-based verifiers do not scale to networks beyond 
the size of FT20 even without any link failures. 


7.3 Refinement Effectiveness 


We now evaluate the effectiveness of our search and refinement techniques. 


Effectiveness of Search. To assess the effectiveness of the search procedure, we 
compute an initial abstraction of the FT20 network suitable for 5 link failures, 
using different values of the search breadth. On top of this, we additionally con- 
sider the impact of some of the heuristics described in Sect. 5. Figure5 presents 
the size (the number of nodes are on the y axis and the number of edges on top 
of the bars) of the computed abstractions with respect to various values for the 
breadth of search and sets of heuristics: 


— Heuristics off means that (almost) all heuristics are turned off. We still try 
to split nodes that are on the cut-set. 


FT20 Abstractions 
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z 75 45 5 45: 
< | 
1 5 15 25 
Search Breadth 
U0 Heuristics off Reachable off ‘Common off [l0 All Heuristics 


Fig. 5. The initial abstraction of FT20 for 5 link failures using different heuristics and 
search breadth. On top of the bars is the number of edges of each abstraction. 
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— Reachable off means that we do not bias towards splitting of nodes in the 
reachable portion of the cut-set. 

— Common off means that we do not bias towards splitting reachable nodes 
that have the most connections to unreachable nodes. 


The results of this experiment show that in order to achieve effective compres- 
sion ratios we need to employ both smart heuristics and a wide search through 
the space of abstractions. It is possible that increasing the search breadth would 
make the heuristics redundant, however, in most cases this would make the 
refinement process exceed acceptable time limits. 


Use of Counterexamples. We now assess how important it is to (1) use sym- 
metries in policy to infer more information from counterexamples, and (2) min- 
imize the counterexample provided by the solver. 

We see in Fig.6 that disabling them increases number of refinement itera- 
tions. While each of these refinements is performed quickly, the same cannot 
be guaranteed of the verification process that runs between them. Hence, it is 
important to keep refinement iterations as low as possible. 


8 Related Work 


Our approach to network fault-tolerance verification draws heavily from ideas in 
prior work exploiting symmetry and abstraction in model checking [4,6,17] and 
automatic abstraction refinement via CEGAR [1,5,9]. However, we apply these 
ideas to network routing, which introduces different challenges and opportunities. 
For example, our notion of abstraction (Vi—abstraction) differs from the typical 
existential abstraction used in model checking [6]. In addition, we have to deal 
with network topological structure and asymmetries introduced by failures. 
Bonsai [3] and Surgeries [22] both 
leverage abstraction to accelerate 
verification for routing protocols and 4015 
packet forwarding respectively. Both 
tools compute a single abstract net- 


FT20 Counterexample Optimizations 


6 


SMT time 


work that is bisimilar to the original ee 2 
concrete network. Alas, neither app- m 5 
roach can be used to reason about 8 2 
properties when faults may occur. 3 5 
Minesweeper [2] is a general app- Link Failures 
roach to control plane verification 
based on a stable state encoding, Qo Symmetric policies off 
which leverages an SMT solver in the i Minimize counterexamples off 
D All Optimizations 


back-end. It supports a wide range 
of routing protocols and properties, 
including fault tolerance properties. 
Our compression is complementary 


Fig. 6. Effectiveness of minimizing coun- 
terexamples and of learning unused edges. On 
top of the bars is the number of SMT calls. 
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to such tools; it is used to alleviate the scaling problem that Minesweeper faces 
with large networks. 

With respect to verification of fault tolerance, ARC [13] translates a limited 
class of routing policies to a weighted graph where fault-tolerance properties can 
be checked using graph algorithms. However, ARC only handles shortest path 
routing and cannot support stateful features such as BGP communities, or local 
preference, etc. While ARC applies graph algorithms on a statically-computed 
graph, we use graph algorithms as part of a refinement loop in conjunction with 
a general purpose solver. 


9 Conclusions 


We present a new theory of distributed routing protocols in the presence of 
bounded link failures, and we use the theory to develop algorithms for network 
compression and counterexample-guided verification of fault tolerance proper- 
ties. In doing so, we observe that (1) even though abstract networks route differ- 
ently from concrete ones in the presence of failures, the concrete routes wind up 
being “at least as good” as the abstract ones when networks satisfy reasonable 
well-formedness constraints, and (2) using efficient graph algorithms (min cut) 
in the middle of the CEGAR loop speeds the search for refinements. 

We implemented our algorithms in a network verification tool called Origami. 
Evaluation of the tool on synthetic networks shows that our algorithms accelerate 
verification of fault tolerance properties significantly, making it possible to verify 
networks out of reach of other state-of-the-art tools. 
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Abstract. Recent distributed systems have introduced variations of 
familiar abstract data types (ADTs) like counters, registers, flags, 
and sets, that provide high availability and partition tolerance. These 
conflict-free replicated data types (CRDTs) utilize mechanisms to resolve 
the effects of concurrent updates to replicated data. Naturally these 
objects weaken their consistency guarantees to achieve availability and 
partition-tolerance, and various notions of weak consistency capture 
those guarantees. 

In this work we study the tractability of CRDT-consistency check- 
ing. To capture guarantees precisely, and facilitate symbolic reasoning, 
we propose novel logical characterizations. By developing novel reduc- 
tions from propositional satisfiability problems, and novel consistency- 
checking algorithms, we discover both positive and negative results. In 
particular, we show intractability for replicated flags, sets, counters, and 
registers, yet tractability for replicated growable arrays. Furthermore, we 
demonstrate that tractability can be redeemed for registers when each 
value is written at most once, for counters when the number of replicas 
is fixed, and for sets and flags when the number of replicas and variables 
is fixed. 


1 Introduction 


Recent distributed systems have introduced variations of familiar abstract 
data types (ADTs) like counters, registers, flags, and sets, that provide high 
availability and partition tolerance. These conflict-free replicated data types 
(CRDTs) [33] efficiently resolve the effects of concurrent updates to replicated 
data. Naturally they weaken consistency guarantees to achieve availability and 
partition-tolerance, and various notions of weak consistency capture such guar- 
antees [8,11,29,35,36]. 

In this work we study the tractability of CRDT consistency checking; Fig. 1 
summarizes our results. In particular, we consider runtime verification: deciding 
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Data Types Complexity 
Add-Wins Set, Remove-Wins Set NP-complete 
Enable-Wins Flag, Disable-Wins Flag NP-complete 
Sets & Flags — with bounded domains PTIME 
Last-Writer-Wins Register (LWw) NP-complete 
Multi-Value Register (MVR) NP-complete 
Registers — with unique values PTIME 
Replicated Counters NP-complete 
Counters — with bounded replicas PTIME 
Replicated Growable Array (RGA) PTIME 


Fig. 1. The complexity of consistency checking for various replicated data types. We 
demonstrate intractability and tractability results in Sects. 3 and 4, respectively. 


whether a given execution of a CRDT is consistent with its ADT specifica- 
tion. This problem is particularly relevant as distributed-system testing tools 
like Jepsen [25] are appearing; without efficient, general consistency-checking 
algorithms, such tools could be limited to specialized classes of errors like node 
crashes. 

Our setting captures executions across a set of replicas as per-replica 
sequences of operations called histories. Roughly speaking, a history is con- 
sistent so long as each operation’s return value can be justified according to 
the operations that its replica has observed so far. In the setting of CRDTs, 
the determination of a replica’s observations is essentially an implementation 
choice: replicas are only obliged to observe their own operations, and the pre- 
decessors of those it has already observed. This relatively-weak constraint on 
replicas’ observations makes the CRDT consistency checking problem unique. 

Our study proceeds in three parts. First, to precisely characterize the con- 
sistency of various CRDTs, and facilitate symbolic reasoning, we develop novel 
logical characterizations to capture their guarantees. Our logical models are built 
on a notion of abstract execution, which relates the operations of a given history 
with three separate relations: a read-from relation, governing the observations 
from which a given operation constitutes its own return value; a happens-before 
relation, capturing the causal relationships among operations; and a linearization 
relation, capturing any necessary arbitration among non-commutative effects 
which are executed concurrently, e.g., following a last-writer-wins policy. Accord- 
ingly, we capture data type specifications with logical axioms interpreted over 
the read-from, happens-before, and linearization relations of abstract executions, 
reducing the consistency problem to: does there exist an abstract execution over 
the given history which satisfies the axioms of the given data type? 

Second, we demonstrate the intractability of several replicated data types 
by reduction from propositional satisfiability (SAT) problems. In particular, we 
consider the 1-in-3 SAT problem [19], which asks for a truth assignment to 
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the variables of a given set of clauses such that exactly one literal per clause 
is assigned true. Our reductions essentially simulate the existential choice of a 
truth assignment with the existential choice of the read-from and happens-before 
relations of an abstract execution. For a given 1-in-3 SAT instance, we construct 
a history of replicas obeying carefully-tailored synchronization protocols, which 
is consistent exactly when the corresponding SAT instance is positive. 

Third, we develop tractable consistency-checking algorithms for individual 
data types and special cases: replicated growing arrays; multi-value and last- 
writer-wins registers, when each value is written only once; counters, when repli- 
cas are bounded; and sets and flags, when their sizes are also bounded. While 
the algorithms for each case are tailored to the algebraic properties of the data 
types they handle, they essentially all function by constructing abstract execu- 
tions incrementally, processing replicas’ operations in prefix order. 

The remainder of this article is organized around our three key contributions: 


1. We develop novel logical characterizations of consistency for the replicated 
register, flag, set, counter, and array data types (Sect. 2); 

2. We develop novel reductions from propositional satisfiability problems to con- 
sistency checking to demonstrate intractability for replicated flags, sets, coun- 
ters, and registers (Sect.3); and 

3. We develop tractable consistency-checking algorithms for replicated growable 
arrays, registers, when written values are unique, counters, when replicas are 
bounded, and sets and flags, when their sizes are also bounded (Sects. 4-6). 


Section 7 overviews related work, and Sect. 8 concludes. 


2 A Logical Characterization of Replicated Data Types 


In this section we describe an axiomatic framework for defining the semantics 
of replicated data types. We consider a set of method names M, and that each 
method m € M has a number of arguments and a return value sampled from a 


data domain D. We will use operation labels of the form m(a) + b to represent 
the call of a method m € M, with argument a € D, and resulting in the value 
b € D. Since there might be multiple calls to the same method with the same 
arguments and result, labels are tagged with a unique identifier 7. We will ignore 
identifiers when unambiguous. 

The interaction between a data type implementation and a client is repre- 
sented by a history h = (Op,ro) which consists of a set of operation labels Op 
and a partial replica order ro ordering operations issued by the client on the 
same replica. Usually, ro is a union of sequences, each sequence representing the 
operations issued on the same replica, and the width of ro, i.e., the maximum 
number of mutually-unordered operations, gives the number of replicas in a given 
history. 

To characterize the set of histories h = (Op, ro) admitted by a certain repli- 
cated data type, we use abstract executions e = (rf, hb, lin), which include: 
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— a read-from binary relation rf over operations in Op, which identifies the set 
of updates needed to “explain” a certain return value, e.g., a write operation 
explaining the return value of a read, 

— astrict partial happens-before order hb, which includes ro and rf, representing 
the causality constraints in an execution, and 

— astrict total linearization order lin, which includes hb, used to model conflict 
resolution policies based on timestamps. 


In this work, we consider replicated data types which satisfy causal consis- 
tency [26], i.e., updates which are related by cause and effect relations are 
observed by all replicas in the same order. This follows from the fact that the 
happens-before order is constrained to be a partial order, and thus transitive 
(other forms of weak consistency don’t pose this constraint). Some of the repli- 
cated data types we consider in this paper do not consider resolution policies 
based on timestamps and in those cases, the linearization order can be ignored. 


READFROM(R) RETVALSET(X, v, Y) 

Yo1, 02. rf(o1, 02) = R(01, 02) Yo1. meth(o1) = X A ret(o1) = v 

<> Joz. rf(o2, 01) A meth(o2) = Y 
^ arg(o1) = arg(o2) 


READFROMMAXIMAL(R) 


Vou, 02, 03. rf(01, 02) Ix R(os, 02) = 


—hb(01, 03) V =hb(03, 02) RETVALCOUNTER 
Yo1. meth(o1) = read 
READALLMAXIMALS(R) => ret(01) = |{o2 : meth(o2) = inc A rf(02, 01) }| 
Voi, 02. hb(01,02) A R(01, 02) — |{o2 : meth(o2) = dec A rf(02,01)}| 
=> Joz. hb“ (01,03) A rf(03, 02) 
Lın LWW 
CLOSEDRF(R) Yo1, 02, 03. rf(o1, 02) A meth(o3) = write 
Yo1, 02,03. R(01, 02) A hb(o1, 03) A arg, (03) = arg(02) A hb(o3, 02) = lin(03, 01) 


A rf(03, 02) = rf(01, 02) 


RETVALREG 
Vo1, v.meth(01) = read A v € ret(01) => J!o2.rf(02, 01) A meth(02) = write A arg (02) = v 


Fig. 2. The axiomatic semantics of replicated data types. Quantified variables are 
implicitly distinct, and J!o denotes the existence of a unique operation o. 


A replicated data type is defined by a set of first-order axioms ® characterizing 
the relations in an abstract execution. A history h is admitted by a data type 
when there exists an abstract execution e such that (h,e) H ®. The satisfaction 
relation } is defined as usual in first order logic. The admissibility problem is 
the problem of checking whether a history h is admitted by a given data type. 

In the following, we define the replicated data types with respect to which 
we study the complexity of the admissibility problem. The axioms used to 
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define them are listed in Figs. 2 and 3. These axioms use the function symbols 
meth-od, arg-ument, and ret-urn interpreted over operation labels, whose seman- 
tics is self-explanatory. 


2.1 Replicated Sets and Flags 


The Add-Wins Set and Remove-Wins Set [34] are two implementations of a repli- 
cated set with operations add(a), remove(x), and contains(x) for adding, removing, 
and checking membership of an element x. Although the meaning of these meth- 
ods is self-evident from their names, the result of conflicting concurrent operations 
is not evident. When concurrent add(x) and remove() operations are delivered to 
a certain replica, the Add-Wins Set chooses to keep the element x in the set, so 
every subsequent invocation of contains(a) on this replica returns true, while the 
Remove-Wins Set makes the dual choice of removing x from the set. 

The formal definition of their semantics uses abstract executions where 
the read-from relation associates sets of add(x) and remove(x) operations to 
contains(x) operations. Therefore, the predicate ReadOk(o1, 02) is defined by 


meth(o,) € {add, remove} ^A meth(o2) = contains A arg(o,) = arg(o2) 
and the Add-Wins Set is defined by the following set of axioms: 


READFROM(ReadOk) A READFROMMAXIMAL(ReadOk) A 
READALLMAXIMALS(ReadOk) A RETVALSET(contains, true, add) 


READFROMMAXIMAL says that every operation read by a contains(x) is maximal 
among its hb-predecessors that add or remove x while READALLMAXIMALS says 
that all such maximal hb-predecessors are read. The RETVALSET instantiation 
ensures that a contains(x) returns true iff it reads-from at least one add(x). 

The definition of the Remove-Wins Set is similar, except for the parame- 
ters of RETVALSET, which become RETVALSET(contains, false, remove), i.e., a 
contains(x) returns false iff it reads-from at least one remove(). 

The Enable-Wins Flag and Disable-Wins Flag are implementations of a set 
of flags with operations: enable(z), disable(a), and read(a), where enable(x) turns 
the flag x to true, disable(x) turns x to false, while read(x) returns the state of 
the flag x. Their semantics is similar to the Add-Wins Set and Remove-Wins 
Set, respectively, where enable(z), disable(a), and read(x) play the role of add(x), 
remove(x), and contains(x), respectively. Their axioms are defined as above. 


2.2 Replicated Registers 


We consider two variations of replicated registers called Multi-Value Register 
(MVR) and Last-Writer-Wins Register (LWW) [34] which maintain a set of reg- 
isters and provide write(x,v) operations for writing a value v on a register x 
and read(x) operations for reading the content of a register x (the domain of 
values is kept unspecified since it is irrelevant). While a read(x) operation of 
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MVR returns all the values written by concurrent writes which are maximal 
among its happens-before predecessors, therefore, leaving the responsibility for 
solving conflicts between concurrent writes to the client, a read(a) operation of 
LWW returns a single value chosen using a conflict-resolution policy based on 
timestamps. Each written value is associated to a timestamp, and a read oper- 
ation returns the most recent value w.r.t. the timestamps. This order between 
timestamps is modeled using the linearization order of an abstract execution. 
Therefore, the predicate ReadOk(01, 02) is defined by 


meth(o,) = write A meth(o2) = read A arg,(01) = arg(o2) A arg(o1) € ret(o2) 


(we use arg, (01) to denote the first argument of a write operation, i.e., the register 
name, and arg,(0,) to denote its second argument, i.e., the written value) and 
the MVR is defined by the following set of axioms: 


READFROM(ReadOk) A READFROMMAXIMAL(ReadOk) A 
READALLMAXIMALS(ReadOk) A RETVALREG 


where RETVALREG ensures that a read(x) operation reads from a write(z,v) 
operation, for each value v in the set of returned values. 

LWW is obtained from the definition of MVR by replacing READALLMAx- 
IMALS with the axiom LINLWW which ensures that every write(z,_) operation 
which happens-before a read(x) operation is linearized before the write(x,_) oper- 
ation from where the read(x) takes its value (when these two write operations 
are different). This definition of LWW is inspired by the “bad-pattern” charac- 
terization in [6], corresponding to their causal convergence criterion. 


2.3 Replicated Counters 


The replicated counter datatype [34] maintains a set of counters interpreted as 
integers (the counters can become negative). This datatype provides operations 
inc(a) and dec(x) for incrementing and decrementing a counter x, and read(x) 
operations to read the value of the counter x. The semantics of the replicated 
counter is quite standard: a read(x) operation returns the value computed as 
the difference between the number of inc(x) operations and dec(x) operations 
among its happens-before predecessors. The axioms defined below will enforce 
the fact that a read(a) operation reads-from all its happens-before predecessors 
which are inc(x) or dec(x) operations. 
Therefore, the predicate ReadOk(01, 02) is defined by 


meth(o;) € {inc, dec} A meth(o2) = read A arg(01) = arg(02) 
and the replicated counter is defined by the following set of axioms: 


READFROM(ReadOk) A CLOSEDRF(ReadOk) A RETVALCOUNTER. 


1 For simplicity, we assume that every history contains a set of write operations writing 
the initial values of variables, which precede every other operation in replica order. 
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READFROMRGA 
Voz. meth(o2) = addAfter = arg, (02) =o V 
Joi. meth(o1) = addAfter A arg,(01) = arg, (02) A rf(01, 02) 

A meth(o2) = remove = Jo1. meth(o,) = addAfter ^ arg,(01) = arg(o2) A rf(01, 02) 

A meth(o2) = read = Vv €E ret(o02) Joi.meth(o1) = addAfter A arg,(01) = v A rf(01, 02) 
RETVALRGA 
Yo1, 02. meth(o1) = read A meth(o2) = addAfter A hb(o2, 01) A arg, (02) Z ret(o1) 

= Joz. meth(o3) = remove A arg(o3) = arga (02) A rf(o3, 01) 
LinRGA 
Vo1, 02. (meth(o1) = meth(o2) = addAfter A arg, (01) = arg, (02) A 
Joz, 04,05. meth(o3) = meth(o4) = addAfter A rfiggatter(01, 03) A rfžddAfter (02, 04) A 


meth(os) = read A arg(04) <o; arga (03)) = lin(o1, 02) 


Fig. 3. Axioms used to define the semantics of RGA. 


2.4 Replicated Growable Array 


The Replicated Growing Array (RGA) [32] is a replicated list used for text- 
editing applications. RGA supports three operations: addAfter(a,b) which adds 
the character b immediately after the occurrence of the character a assumed to 
be present in the list, remove(a) which removes a assumed to be present in the 
list, and read() which returns the list contents. It is assumed that a character is 
added at most once”. The conflicts between concurrent addAfter operations that 
add a character immediately after the same character is solved using timestamps 
(i.e., each added character is associated to a timestamp and the order between 
characters depends on the order between the corresponding timestamps), which 
in the axioms below are modeled by the linearization order. 
Figure 3 lists the axioms defining RGA. READFROMRGA ensures that: 


— every addAfter(a,b) operation reads-from the addAfter(_,a) adding the char- 
acter a, except when a = o which denotes the “root” element of the list?, 

— every remove(a) operation reads-from the operation adding a, and 

— every read operation returning a list containing a reads-from the operation 
addAfter(_,a) adding a. 


Then, RETVALRGA ensures that a read operation o} happening-after an 
operation adding a character a reads-from a remove(a) operation when a doesn’t 
occur in the list returned by 0, (the history must contain a remove(a) operation 
because otherwise, a should have occurred in the list returned by the read). 

Finally, LINRGA models the conflict resolution policy by constraining the 
linearization order between addAfter(a,_) operations adding some character 


? In a practical context, this can be enforced by tagging characters with replica iden- 
tifiers and sequence numbers. 
3 This element is not returned by read operations. 
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immediately after the same character a. As a particular case, LINRGA enforces 
that addAfter(a,b) is linearized before addAfter(a,c) when a read operation returns 
a list where c precedes b (addAfter(a,b) results in the list a -b and applying 
addAfter(a,c) on a- b results in the list a - c- b). However, this is not suffi- 
cient: assume that the history contains the two operations addAfter(a,b) and 
addAfter(a,c) along with two operations remove(b) and addAfter(b,d). Then, a 
read operation returning the list a- c- d must enforce that addAfter(a,b) is lin- 
earized before addAfter(a,c) because this is the only order between these two 
operations that can lead to the result a c- d, i.e., executing addAfter(a,b), 
addAfter(b,d), remove(b), addAfter(a,c) in this order. LINRGA deals with any 
scenario where arbitrarily-many characters can be removed from the list: rfžjdAfter 
is the reflexive and transitive closure of the projection of rf on addAfter opera- 
tions and <,, denotes the order between characters in the list returned by the 
read operation 05. 


3 Intractability for Registers, Sets, Flags, and Counters 


In this section we demonstrate that checking the consistency is intractable for 
many widely-used data types. While this is not completely unexpected, since 
some related consistency-checking problems like sequential consistency are also 
intractable [20], this contrasts recent tractability results for checking strong 
consistency (i.e., linearizability) of common non-replicated data types like sets, 
maps, and queues [15]. In fact, in many cases we show that intractability even 
holds if the number of replicas is fixed. 

Our proofs of intractability follow the general structure of Gibbons and 
Korach’s proofs for the intractability of checking sequential consistency (SC) for 
atomic registers with read and write operations [20]. In particular, we reduce a 
specialized type of NP-hard propositional satisfiability (SAT) problem to check- 
ing whether histories are admitted by a given data type. While our construction 
borrows from Gibbons and Korach’s, the adaptation from SC to CRDT con- 
sistency requires a significant extension to handle the consistency relaxation 
represented by abstract executions: rather than a direct sequencing of threads’ 
operations, CRDT consistency requires the construction of three separate rela- 
tions: read-from, happens-before, and linearization. 

Technically, our reductions start from the 1-in-3 SAT problem [19]: given 
a propositional formula /\j",(a; V Bi V yi) over variables £1,...,&n with only 
positive literals, i.e., a;, Bi, Yi E {£1,..., n}, does there exist an assignment to 
the variables such that exactly one of a;, Bi, yi per clause is assigned true? The 
proofs of Theorems 1 and 2 reduce 1-in-3 SAT to CRDT consistency checking. 


Theorem 1. The admissibility problem is NP-hard when the number of replicas is 
fixed for the following data types: Add-Wins Set, Remove-Wins Set, Enable- Wins 
Flag, Disable- Wins Flag, Multi- Value Register, and Last-Writer-Wins Register. 
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Replica 0 Replica 1 Replica 2 
Enable(x1) Disable(x1) 
Round 0 e ig 
Enable(zp,) Disable(x») 
Enable(yo) Enable(y1) Enable(y2) 
Barrier 1 Read(y1) = true Read(yo) = true Read(yo) = true 
Read(y2) = true Read(y2) = true Read(yi) = true 
Read(a1) = true Read(a1) = false Read(a1) = false 
Read(81) = false Read(f1) = true Read(81) = false 
Round 1 Read(y1) = false Read(71) = false Read(7y1) = true 
Disable(a1) Disable( 1) Disable(71) 
Enable(1) Enable(71) Enable(a1) 
Disable(yo) Disable(y1) Disable(y2) 
Barrier 2 Read(yi) = false Read(yo) = false Read(yo) = false 
Read(y2) = false Read(y2) = false Read(yi) = false 
Read(am) = true Read(am) = false Read(am) = false 
Read(6m) = false Read(8,,) = true Read(8m) = false 
Round m Read(ym) = false Read(7m) = false Read(ym) = true 
Disable(a,,) Disable(8m ) Disable(ym) 
Enable(Bm ) Enable(7m ) Enable(am) 


Fig. 4. The encoding of a 1-in-3 SAT problem Nee (iV biy) over variables £1,..., £n 
as a 3-replica history of a flag data type. Besides the flag variable x; for each propo- 
sitional variable xj, the encoding adds per-replica variables y; for synchronization 


barriers. 


Proof. We demonstrate a reduction from the 1-in-3 SAT problem. For a given 
problem p = Arc (a; V Bi V 7i) over variables x1,...,2,, we construct a 3-replica 
history hy of the flag data type — either enable- or disable-wins — as illustrated 
in Fig. 4. The encoding includes a flag variable x; for each propositional variable 
zj, along with a per-replica flag variable y; used to implement synchronization 
barriers. Intuitively, executions of hp proceed in m + 1 rounds: the first round 
corresponds to the assignment of a truth valuation, while subsequent rounds 
check the validity of each clause given the assignment. The reductions to sets 
and registers are slight variations on this proof, in which the Read, Enable, and 
Disable operations are replaced with Contains, Add, and Remove, respectively, 
and Read and Writes of values 1 and 0, respectively. 

It suffices to show that the constructed history hp is admitted if and only if 
the given problem p is satisfiable. Since the flag data type does not constrain 
the linearization relation of its abstract executions, we regard only the read- 
from and happens-before components. It is straightforward to verify that the 
happens-before relations of h,’s abstract executions necessarily order: 


1. every pair of operations in distinct rounds — due to barriers; and 

2. every operation in a given round, over all replicas, without interleaving the 
operations of distinct replicas within the same round — since a replica’s 
reads in a given round are only consistent with the other replicas’ after the 
re-enabling and -disabling of flag variables. 


On the Complexity of Checking Consistency for Replicated Data Types 333 


In other words, replicas appear to execute atomically per round, in a round- 
robin fashion. Furthermore, since all operations in a given round happen before 
the operations of subsequent rounds, the values of flag variables are consistent 
across rounds —i.e., as read by the first replica to execute in a given round — 
and determined in the initial round either by conflict resolution — i.e., enable- 
or disable-wins — or by happens-before, in case conflict resolution would have 
been inconsistent with subsequent reads. 

In the “if” direction, let r € {0,1,2}™ be the positions of positively-assigned 
variables in each clause, e.g., ri = 0 implies a; = true and Bi = yi = false. 
We construct an abstract execution ep in which the happens-before relation 
sequences the operations of replica r; before those of r; + 1 mod 3, and in turn 
before r; + 2 mod 3. In other words, the replicas in round i appear to execute in 
left-to-right order from starting with the replica r;, whose reads correspond to 
the satisfying assignment of (a; V 6:V y:i). The read-from relation of ep relates each 
Read(x;) = true operation to the most recent Enable(x;) operation in happens- 
before order, which is unique since happens-before sequences the operations of 
all rounds; the case for Read(x;) = false and Disable(z;) is symmetric. It is then 
straightforward to verify that e, satisfies the axioms of the enable- or disable- 
wins flag, and thus h, is admitted. 

In the “only if’ direction, let e be an abstract execution of hp, and let r € 
{0,1,2}"" be the replicas first to execute in each round according to the happens- 
before order of e. It is straightforward to verify that the assignment in which a 
given variable is set to true iff the replica encoding its positive assignment in 
some clause executes first in its round, i.e., 


true if Ji.(r; = 0 A a; = zj) V (ri = 1 A ĝi = zj) V (ri = 2A% = £3) 
“i ] false otherwis 
false otherwise, 


is a satisfying assignment to p. 


Theorem 1 establishes intractability of consistency for the aforementioned 
sets, flags, and registers, independently from the number of replicas. In contrast, 
our proof of Theorem 2 for counter data types depends on the number of replicas, 
since our encoding requires two replicas per propositional variable. Intuitively, 
since counter increments and decrements are commutative, the initial round in 
the previous encoding would have fixed all counter values to zero. Instead, the 
next encoding isolates initial increments and decrements to independent replicas. 
The weaker result is indeed tight since checking counter consistency with a fixed 
number of replicas is polynomial time, as Sect. 5 demonstrates. 


Theorem 2. The admissibility problem for the Counter data type is NP-hard. 


Proof. We demonstrate a reduction from the 1-in-3 SAT problem. For a given 
problem p = A;£1 (ai V bi V yi) over variables £1,..., £n, we construct a history 
hp of the counter data type over 2n + 3 replicas, as illustrated in Fig. 5. 
Besides the differences imposed due to the commutativity of counter incre- 
ments and decrements, our reduction follows the same strategy as in the proof of 
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Theorem 1: the happens-before relation of hp’s abstract executions order every 
pair of operations in distinct rounds (of Replicas 0-2), and every operation in 
a given (non-initial) round. As before, Replicas 0-2 appear to execute atomi- 
cally per round, in a round-robin fashion, and counter variables are consistent 
across rounds. The key difference is that here abstract executions’ happens- 
before relations only relate the operations of either Replica 27+1 or 27+2, for 
each j = 1,...,n, to operations in subsequent rounds: the other’s operations are 
never observed by other replicas. Our encoding ensures that exactly one of each 
is observed by ensuring that the counter y is incremented exactly n times — and 
relying on the fact that every variable appears in some clause, so that a read 
that observed neither or both would yield the value zero, which is inconsistent 
with Ap. Otherwise, our reasoning follows the proof of Theorem 1, in which the 
read-from relation selects all increments and decrements of the same counter 
variable in happens-before order. 


4 Polynomial-Time Algorithms for Registers and Arrays 


We show that the problem of checking consistency is polynomial time for RGA, 
and even for LWW and MVR under the assumption that each value is written 
at most once, i.e., for each value v, the input history contains at most one write 
operation write(z,v). Histories satisfying this assumption are called differenti- 
ated. The latter is a restriction motivated by the fact that practical implemen- 
tations of these datatypes are data-independent [38], i.e., their behavior doesn’t 
depend on the concrete values read or written and any potential buggy behavior 
can be exposed in executions where each value is written at most once. Also, 
in a testing environment, this restriction can be enforced by tagging each value 
with a replica identifier and a sequence number. 

In all three cases, the feature that enables polynomial time consistency check- 
ing is the fact that the read-from relation becomes fixed for a given history, i.e., 
if the history is consistent, then there exists exactly one read-from relation rf 
that satisfies the READFROM_ and RETVAL-_ axioms, and rf can be derived syn- 
tactically from the operation labels (using those axioms). Then, our axiomatic 
characterizations enable a consistency checking algorithm which roughly, con- 
sists in instantiating those axioms in order to compute an abstract execution. 


The consistency checking algorithm for RGA, LWW, and MVR is listed in 
Algorithm 1. It computes the three relations rf, hb, and lin of an abstract execu- 
tion using the datatype’s axioms. The history is declared consistent iff there exist 
satisfying rf and hb relations, and the relations hb and lin computed this way are 
acyclic. The acyclicity requirement comes from the definition of abstract execu- 
tions where hb and lin are required to be partial/total orders. While an abstract 
execution would require that lin is a total order, this algorithm computes a par- 
tial linearization order. However, any total order compatible with this partial 
linearization would satisfy the axioms of the datatype. 

ComputeRF computes the read-from relation rf satisfying the READFROM_ 
and RETVAL_ axioms. In the case of LWW and MVR, it defines rf as the set 
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Replica 0 Replica 27+1 Replica 27 +2 


Inc(y) Inc(y) 
Round 0 Inc(x;) Dec(z;) 


Read(y) =n 


Replica 1 Replica 2 


Inc(z) Inc(z) Inc(z) 


Barrier 1 Read(z Read(z) = 3 Read(z) = 3 


Read(81) = —1 Read(81) = 1 Read(81) = —1 
Read(y1) = —1 Read(y1) = —1 Read(yı) = 1 
Dec(a1); Dec(a1) Dec(81); Dec(81) Dec(¥1); Dec(71) 
Inc(81); Inc(81) Inc(71); Inc(71) Inc(a1); Inc(a1) 


Dec(z) Dec(z) Dec(z) 
Read(z) = 0 Read(z) = 0 Read(z) = 0 


Round 1 


Barrier 2 


E Read(œı) = 1 Read(a;) = —1 Read(a1) = —1 


Read(am) = 1 Read(am) = —1 Read(am) = —1 
Read(6m) = —1 Read(Bm) = 1 Read(Bm) = —1 
Round mi Read(ym) = —1 Read(ym) = —1 Read(ym) = 1 
Dec(am); Dec(@m) Dec(8m); Dec(Bm)  Dec(Yym); Dec(ym) 
Inc(8m); Inc(Bm) Inc(ym); Inc(ym) Inc(a@m); Inc(am) 
. Inc(z) or Dec(z) Inc(z) or Dec(z) Inc(z) or Dec(z) 
Barrier m1 Read(z) = 3 or 0 Read(z) = 3 or 0 Read(z) = 3 or 0 


Round m+1{ Read(y) =n 


Fig. 5. The encoding of a 1-in-3 SAT problem Aj, (aiVGiV) over variables z1, ..., 2n 
as the history of a counter over 2n+3 replicas. Besides the counter variables x; encoding 
propositional variables zj, the encoding adds a variable y encoding the number of initial 
increments and decrements, and a variable z to implement synchronization barriers. 


of all pairs formed of write(z,v) and read(x) operations where v belongs to the 
return value of the read. By RETVAL_, each read(x) operation must be associated 
to at least one write(z,_) operation. Also, the fact that each value is written 
at most once implies that this rf relation is uniquely defined, e.g., for LWW, 
it is not possible to find two write operations that could be rf related to the 
same read operation. In general, if there exists no rf relation satisfying these 
axioms, then ComputeRF returns a distinguished value to signal a consistency 
violation. Note that the computation of the read-from for LWW and MVR is 
quadratic time* since the constraints imposed by the axioms relate only to the 
operation labels, the methods they invoke or their arguments. The case of RGA 
is slightly more involved because the axiom RETVALRGA introduces more read- 
from constraints based on the happens-before order which includes ro and the 
rf itself. In this case, the computation of rf relies on a fixpoint computation, 
which converges in at most quadratic time (the maximal size of rf), described 
in Algorithm 2. Essentially, we use the axiom READFROMRGA to populate the 


4 Assuming constant time lookup/insert operations (e.g., using hashmaps), this com- 
plexity is linear time. 
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Input: A differentiated history h = (Op, ro) and a datatype T. 
Output: true iff h satisfies the axioms of T. 


rf — ComputeRF(h,READFROM[T],RETVAL[T] ); 
if rf = L then return false; 
hb — (ro U rf)”; 
if hb is cyclic or (h, rf, hb) Æ READFROMMAXIMAL|T] A READALLMAXIMALS[T] 
then 
| return false; 
lin — hb; 
lin — LinClosure(hb,LıN[T]); 
if lin is cyclic then return false; 
return true; 


Algorithm 1. Consistency checking for RGA, LWW, and MVR. Re... [|T] 
refers to an axiom of T, or true when T lacks such an axiom. The relation 
R* denotes the transitive closure of R. 


e O nN 


o OoN 


read-from relation and then, apply the axiom RETVALRGA iteratively, using the 
read-from constraints added in previous steps, until the computation converges. 

After computing the read-from relation, our algorithm defines the happens- 
before relation hb as the transitive closure of ro union rf. This is sound because 
none of the axioms of these datatypes enforce new happens-before constraints, 
which are not already captured by ro and rf. Then, it checks whether the hb 
defined this way is acyclic and satisfies the datatype’s axioms that constrain hb, 
i.e., READFROMMAXIMAL and READALLMAXIMALS (when they are present). 

Finally, in the case of LWW and RGA, the algorithm computes a (partial) 
linearization order that satisfies the corresponding LIN- axioms. Starting from 
an initial linearization order which is exactly the happens-before, it computes 
new constraints by instantiating the universally quantified axioms LINLWW 
and LINRGA. Since these axioms are not “recursive”, i.e., they don’t enforce 
linearization order constraints based on other linearization order constraints, 
a standard instantiation of these axioms is enough to compute a partial lin- 
earization order such that any extension to a total order satisfies the datatype’s 
axioms. 


Theorem 3. Algorithm 1 returns true iff the input history is consistent. 


The following holds because Algorithm 1 runs in polynomial time — the 
rank depends on the number of quantifiers in the datatype’s axioms. Indeed, 
Algorithm 1 represents a least fixpoint computation which converges in at most 
a quadratic number of iterations (the maximal size of rf). 


Corollary 1. The admissibility problem is polynomial time for RGA, and for 
LWW and MVR on differentiated histories. 


On the Complexity of Checking Consistency for Replicated Data Types 337 


Input: A history h = (Op, ro) of RGA. 
Output: An rf satisfying READFROMRGA ^ RETVALRGA, if exists; L o/w 


1 rf — {(01, 02) : meth(o,) = addAfter, meth(o2) € 
{addAfter, remove, read}, arg,(01) = arg, (02) V arga (01) € ret(o2)}; 
if (h,rf) A READFROMRGA then return L ; 
while true do 
rfı — Í; 
foreach 01,02 € Op s.t. (02,01) € (rf U ro)? and meth(01) = read and 
meth(o02) = addAfter and arg, (02) ¢ ret(01) do 
if Joz € Op s.t. meth(o3) = remove and arg(03) = arga (02) then 
| rfi = rfi U { (03, 01)}; 
else 
| return L; 
10 if rfı C rf then break; 
11 else rf — rf U rfi; 


ab wh 


ooN 


12 return rf; 


Algorithm 2. The procedure ComputeRF for RGA. 


5 Polynomial-Time Algorithms for Replicated Counters 


In this section, we show that checking consistency for the replicated counter 
datatype becomes polynomial time assuming the number of replicas in the input 
history is fixed (i.e., the width of the replica order ro is fixed). We present an algo- 
rithm which constructs a valid happens-before order (note that the semantics of 
the replicated counter doesn’t constrain the linearization order) incrementally, fol- 
lowing the replica order. At any time, the happens-before order is uniquely deter- 
mined by a prefix mapping that associates to each replica a prefix of the history, 
i.e., a set of operations which is downward-closed w.r.t. replica order (i.e., if it con- 
tains an operation it contains all its ro predecessors). This models the fact that the 
replica order is included in the happens-before and therefore, if an operation 01 
happens-before another operation 02, then all the ro predecessors of 0; happen- 
before o2. The happens-before order can be extended in two ways: (1) adding an 
operation issued on the replica į to the prefix of replica 7, or (2) “merging” the 
prefix of a replica j to the prefix of a replica i (this models the delivery of an oper- 
ation issued on replica j and all its happens-before predecessors to the replica i). 
Verifying that an extension of the happens-before is valid, i.e., that the return val- 
ues of newly-added read operations satisfy the RETVALCOUNTER axiom, doesn’t 
depend on the happens-before order between the operations in the prefix asso- 
ciated to some replica (it is enough to count the inc and dec operations in that 
prefix). Therefore, the algorithm can be seen as a search in the space of prefix 
mappings. If the number of replicas in the input history is fixed, then the number 
of possible prefix mappings is polynomial in the size of the history, which implies 
that the search can be done in polynomial time. 

Let h = (Op, ro) be a history. To simplify the notations, we assume that the 
replica order is a union of sequences, each sequence representing the operations 
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Input: History h = (Op, ro), prefix map m, and set seen of invalid prefix maps 
Output: true iff there exists read-from and happens-before relations rf and hb 
such that m C hb, and (h, rf, hb) satisfies the counter axioms. 


if m is complete then return true; 

foreach replica i do 

foreach replica j #1 do 

m! — mi — m(i) U m(j)] 

if m’ ¢ seen and checkCounter(h, m’, seen) then 
| return true; 

seen +— seen U {m}; 

if 


Ww 


01. ro’ (last;(m), 01) then 

if meth(oi) = read and 

arg(o1) = x A ret(o1) 4 |{o € mfi]|o = inc(x)}| — |{o € mli]|o = dec(x)}| 
then 

10 | return false; 

11 m — mfi — m(i) U {01}; 

12 if m’ ¢ seen and checkCounter(h,m’, seen) then 

13 | return true; 

14 seen — seen U {m’}; 


COCMANAOA AR wWN RB 


15 return false; 


Algorithm 3. The procedure checkCounter, where ro! denotes immediate 
ro-successor, and f[a — b] updates function f with mapping a+ b. 


issued on the same replica. Therefore, each operation o € Op is associated with 
a replica identifier rep(o) € [1..n], where np is the number of replicas in A. 

A prefix of h is a set of operation Op’ C Op such that all the ro predecessors 
of operations in Op’ are also in Op’, i.e., Vo € Op. ro~!(o0) € Op. Note that the 
union of two prefixes of h is also a prefix of h. The last operation of replica i in 
a prefix Op’ is the ro-maximal operation o with rep(o) = i included in Op’. A 
prefix Op’ is called valid if (Op’, ro’), where ro’ is the projection of ro on Op’, is 
admitted by the replicated counter. 

A prefix map is a mapping m which associates a prefix of h to each replica 
i € [1..np]. Intuitively, a prefix map defines for each replica i the set of operations 
which are “known” to i, i.e., happen-before the last operation of 7 in its prefix. 
Formally, a prefix map m is included in a happens-before relation hb, denoted 
by m C hb, if for each replica i € [1..np], hb(o,0;) for each operation in o € 
m(t) \ {o;}, where o; is the last operation of i in m(i). We call o; the last 
operation of i in m, and denoted it by last;(m). A prefix map m is valid if it 
associates a valid prefix to each replica, and complete if it associates the whole 
history h to each replica i. 

Algorithm 3 lists our algorithm for checking consistency of replicated counter 
histories. It is defined as a recursive procedure checkCounter that searches for 
a sequence of valid extensions of a given prefix map (initially, this prefix map 
is empty) until it becomes complete. The axiom RETVALCOUNTER is enforced 
whenever extending the prefix map with a new read operation (when the last 
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operation of a replica 7 is “advanced” to a read operation). The following theorem 
states of the correctness of the algorithm. 


Theorem 4. checkCounter(h,@,9) returns true iff the input history is consis- 
tent. 


When the number of replicas is fixed, the number of prefix maps becomes 
polynomial in the size of the history. This follows from the fact that prefixes are 
uniquely defined by their ro-maximal operations, whose number is fixed. 


Corollary 2. The admissibility problem for replicated counters is polynomial- 
time when the number of replicas is fixed. 


6 Polynomial-Time Algorithms for Sets and Flags 


While Theorem 1 shows that the admissibility problem is NP-complete for repli- 
cated sets and flags even if the number of replicas is fixed, we show that this 
problem becomes polynomial time when additionally, the number of values added 
to the set, or the number of flags, is also fixed. Note that this doesn’t limit the 
number of operations in the input history which can still be arbitrarily large. In 
the following, we focus on the Add-Wins Set, the other cases being very similar. 

We propose an algorithm for checking consistency which is actually an exten- 
sion of the one presented in Sect. 5 for replicated counters. The additional com- 
plexity in checking consistency for the Add-Wins Set comes from the validity 
of contains(x) return values which requires identifying the maximal predecessors 
in the happens-before relation that add or remove x (which are not necessarily 
the maximal hb-predecessors all-together). In the case of counters, it was enough 
just to count happens-before predecessors. Therefore, we extend the algorithm 
for replicated counters such that along with the prefix map, we also keep track 
of the hb-maximal add(z) and remove(z) operations for each element x and 
each replica 7. When extending a prefix map with a contains operation, these 
hb-maximal operations (which define a witness for the read-from relation) are 
enough to verify the RETVALSET axiom. Extending the prefix of a replica with 
an add or remove operation (issued on the same replica), or by merging the prefix 
of another replica, may require an update of these hb-maximal predecessors. 

When the number of replicas and elements are fixed, the number of read- 
from maps is polynomial in the size of the history — recall that the number 
of operations associated by a read-from map to a replica and set element is 
bounded by the number of replicas. Combined with the number of prefix maps 
being polynomial when the number of replicas is fixed, we obtain the following 
result. 


Theorem 5. Checking whether a history is admitted by the Add-Wins Set, 
Remove-Wins Set, Enable-Wins Flag, or the Disable-Wins Flag is polynomial 
time provided that the number of replicas and elements/flags is fixed. 
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7 Related Work 


Many have considered consistency models applicable to CRDTs, including causal 
consistency [26], sequential consistency [27], linearizability [24], session consis- 
tency [35], eventual consistency [36], and happens-before consistency [29]. Bur- 
ckhardt et al. [8,11] propose a unifying framework to formalize these models. 
Many have also studied the complexity of verifying data-type agnostic notions 
of consistency, including serializability, sequential consistency and linearizabil- 
ity [1,2,4,18,20,22,30], as well as causal consistency [6]. Our definition of the 
replicated LWW register corresponds to the notion of causal convergence in [6]. 
This work studies the complexity of the admissibility problem for the repli- 
cated LWW register. It shows that this problem is NP-complete in general and 
polynomial time when each value is written only once. Our NP-completeness 
result is stronger since it assumes a fixed number of replicas, and our algo- 
rithm for the case of unique values is more general and can be applied uni- 
formly to MVR and RGA. While Bouajjani et al. [5,14] consider the com- 
plexity for individual linearizable collection types, we are the first to establish 
(in)tractability of individual replicated data types. Others have developed effec- 
tive consistency checking algorithms for sequential consistency [3,9,23,31], seri- 
alizability [12,17,18,21], linearizability [10,16,28,37], and even weaker notions 
like eventual consistency [7] and sequential happens-before consistency [13, 15]. 
In contrast, we are the first to establish precise polynomial-time algorithms for 
runtime verification of replicated data types. 


8 Conclusion 


By developing novel logical characterizations of replicated data types, reduc- 
tions from propositional satisfiability checking, and tractable algorithms, we have 
established a frontier of tractability for checking consistency of replicated data 
types. As far as we are aware, our results are the first to characterize the asymp- 
totic complexity consistency checking for CRDTs. 
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Abstract. The verification of asynchronous fault-tolerant distributed 
systems is challenging due to unboundedly many interleavings and net- 
work failures (e.g., processes crash or message loss). We propose a method 
that reduces the verification of asynchronous fault-tolerant protocols to 
the verification of round-based synchronous ones. Synchronous protocols 
are easier to verify due to fewer interleavings, bounded message buffers 
etc. We implemented our reduction method and applied it to several state 
machine replication and consensus algorithms. The resulting synchronous 
protocols are verified using existing deductive verification methods. 


1 Introduction 


Fault tolerance protocols provide dependable services on top of unreliable com- 
puters and networks. One distinguishes asynchronous vs. synchronous pro- 
tocols based on the semantics of parallel composition. Asynchronous proto- 
cols are crucial parts of many distributed systems for their better perfor- 
mance when compared against the synchronous ones. However, their correct- 
ness is very hard to obtain, due to the challenges of concurrency, faults, 
buffered message queues, and message loss and re-ordering at the network 
[5, 19,21, 26,31,35,37,42]. In contrast, reasoning about synchronous round-based 
semantics is simpler, as one only has to consider specific global states at round 
boundaries [1,8,10,11,13,17,29,32, 40]. 

The question we address is how to connect both worlds, in order to exploit 
the advantage of verification in synchronous semantics when reasoning about 
asynchronous protocols. We consider asynchronous protocols that work in unre- 
liable networks, which may lose and reorder messages, and where processes may 
crash. We focus on a class of protocols that solve state machine replication. 

Due to the absence of a global clock, fault tolerance protocols implement an 
abstract notion of time to coordinate. The local state of a process maintains the 
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value of the abstract time (potentially implicit), and a process timestamps the 
messages it sends accordingly. Synchronous algorithms do not need to imple- 
ment an abstract notion of time: it is embedded in the definition of any syn- 
chronous computational model [9, 15,18,28], and it is called the round number. 
The key insight of our results is the existence of a correspondence between val- 
ues of the abstract clock in the asynchronous systems and round numbers in 
the synchronous ones. Using this correspondence, we make explicit the “hidden” 
round-based synchronous structure of an asynchronous algorithm. 


Pi 


P2 


P3, 


'NewBallot1  :AckBalloti  NewBallot2 = AckBaliot 2 


Fig. 1. Asynchronous executions without jumps 


; z P AckBaliot 20 
oOo : + NewBallot 20 
Pi 5 


out(20,p2) 


P3 


Ballot 1 'Ballot2! .. ‘Ballot 1s; Ballot20 BJ iia 


(b) 


Fig. 2. Asynchronous executions with jumps 


We discuss our approach using a leader election algorithm. We consider n 
of processes, which periodically elect collectively a new leader. These periods 
are called ballots, and in each ballot at most one leader should be elected. The 
protocol in Fig.3 solves leader election. In a ballot, a process that wants to 
become leader proposes itself by sending a message containing its identifier me 
to all, and it is elected if (1) a majority of processes receive its message, (2) these 
receivers send a message of leadership acknowledgment to the entire network, 
and (3) at least one processes receives leadership acknowledgments for its leader 
estimate from a majority of processes. Figure 1(b) sketches an execution where 
process P3 fails to be elected in ballot 1 because the network drops all the 
messages sent by P3 marked with a cross. All processes timeout and there is 
no leader elected in ballot 1. In the second ballot, P2 tries to become leader, 
the network delivers all messages between P1 and P2 in time, the two processes 
form a majority, and P2 is elected leader of ballot 2. 

The protocol is defined by the asynchronous parallel composition of n copies 
of the code in Fig. 3. Each process executes a loop, where each iteration defines 
the executors behavior in a ballot. The variable ballot encodes the ballot num- 
ber. The function coord() provides a local estimate whether a process should 
try to become leader. Multiple processes may be selected by coord() as leader 
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log = NULL; mbox = NULL; ballot = 0; 
> while(true) <— 
if(coord() == me) 


~ 


e 
ballot++; label = NewBallot; ballot++; label = NewBallot; 
msg* m = new msg(ballot,label,me) ; while(true){ 

//@assert m->bal==ballot && m->lab==label msg* m=recv(geq(ballot, label)) ; 
send(m,*) ; add (mbox ,m) 

leader = me; if (mbox!=0 && mbox->size==1 

|| timeout ()) break; 


label= AckBallot; i } l 
msg* m = new msg(ballot,label,leader); if (mbox!=0 && mbox->size==1){ 
send(m,*); ballot = mbox->message->bal; 
PCT Chine pee a 5 soem ek oe d leader = mbox->message->sender; 

m = recv(eq(ballot,label)); mbox = NULL; 


//@ assert m->bal >= ballot 
WH && m->lab >= label 
add(mbox, m); 


label = AckBallot; 


i 
i 
i 
i 
i 
i msg* m = new msg(ballot,label,leader) 
i 
i 
i 
i 


if ((mbox!=0 && mbox->size>n/2) send(m,*); 
|| timeout()) break; nae aea a Lae pad a 
EE E A EAST D ENA m = recv(eq(ballot,label)); 


&& all_same(mbox, leader)){ if ((mbox!=0 && mbox->size>n/2) 


i 
i 
i 

if (mbox!=0 && mbox->size>n/2 add(mbox, m); 
i 

//@assert (equal (mbox, ballot ,label)); | || timeout()) break; 

i 


add(log, new(ballot, leader)); 1 pE E A N 1 
out (ballot,leader); } if (mbox!=0 && mbox->size>n/2 
mbox = NULL; 1 &&all_same(mbox, leader)){ 


msgs eq(int b, enum St 1) add(log, new(ballot, leader)) ; 


struct msg { msg* m = recv(); b Cer 
int bal; if (m->bal == b && m->lab == 1) S É 

enum St lab; return m; 

Pid sender;} else return NULL;} 


Fig. 3. Control flow graph of asynchronous leader election. (Color figure online) 


candidates, resulting in a race which is won by a process that is acknowledged 
by a majority (more than n/2 processes). Depending on the result of coord (), 
a process may take the leader branch on the left or the follower branch on the 
right. On the leader branch, a message is prepared and sent, at line 7. The mes- 
sage contains the ballot number, the label NewBallot, the leaders identity. On 
the other branch, a follower waits for a message from a process, which proposes 
itself for the current ballot number of the follower. This waiting is implemented 
by a loop, which terminates either on timeout or when a message is received. 
Next, the followers, which received a message, and the leader candidates send 
their leader estimate to all at lines 12 and 41, where the message contains the 
ballots number, the label AckBallot, and the leaders identity. If a processes 
receives more than n/2 messages labeled with AckBallot and its current ballot, 
it checks using all_same(mbox, leader) in lines 22 and 49, whether a majority 
of processes acknowledges the leadership of its estimate. In this case, it adds 
this information to the array log (which stores the locally elected leader of each 
ballot, if any) and outputs it, before it empties its mailbox and continues with 
the next iteration. 

Figure l(a) shows another execution of this protocol. Again, P3 sends 
NewBallot messages for ballot 1 to all processes. P3’s NewBallot messages are 
delayed, and P2 times out in ballot 1, moving to ballot 2 where it is a leader 
candidate. The messages sent in ballot 2 are exchanged like in Fig. 1(b). Con- 
trary to Fig. 1(b), while exchanging ballot 2 messages, the network delivers to 
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P2, P3’s NewBallot message from ballot 1. However, P2 ignores it, because of 
the receive statement in line 14 that only accepts messages for greater or equal 
(ballot, label) pairs. The message from ballot 1 arrived too “late” because P2 
already is in ballot 2. Thus, the messages from ballot 1 have the same effect as if 
they were dropped, as in Fig. 1(b). The executions are equivalent from the local 
perspective of the processes: By applying a “rubber band transformation” [30], 
one can reorder transitions, while maintaining the local control flow and the 
send/receive causality. 

Another case of equivalent executions is given in Fig. 2. While P1 and P2 
made progress, P3 was disconnected. In Fig. 2(a), while P3 is waiting for ballot 1 
messages, the networks delivers a message for ballot 20. P3 receives this message 
in line 29 and updates ballot in line 35. P3 thus “jumps forward in time”, 
acknowledging P2’s leadership in ballot 20. In Fig. 2(b), P3’s timeout expires in 
all ballots from 1 to 19, without P3 receiving any messages. Thus, it does not 
change its local state (except the ballot number) in these ballots. For P3, these 
two executions are stutter equivalent. Reducing verification to verification of 
executions as the ones to the right —i.e., synchronous executions — reduces the 
number of interleavings and drastically simplifies verification. In the following 
we discuss conditions on the code that allow such a reduction. 


Communication Closure. In our example, the variables ballot and label encode 
abstract time: Let b and £ be their assigned values. Then abstract time ranges 
over T = {(b,£):b € N, € {NewBallot, AckBallot}}. We fix NewBallot to 
be less than AckBallot, and consider the lexicographical order over T. The 
sequence of (b, Z) induced by an execution at a process is monotonically increas- 
ing; thus (b, £) encodes a notion of time. A protocol is communication-closed if 
(i) each process sends only messages timestamped with the current time, and (ii) 
each process receives only messages timestamped with the current or a higher 
time value. For such protocols we show in Sect. 5 that for each asynchronous exe- 
cution, there is an equivalent (processes go through the same sequence of local 
states) synchronous one. We use ideas from [17], but we allow reacting to future 
messages, which is a more permissive form of communication closure. This is 
essential for jumping forward, and thus for liveness in fault tolerance protocols. 

The challenge is to check communication closure at the code level. For this, 
we rely on user-provided “tag” annotations that specify the variables and the 
message fields representing local time and timestamps. A system of assertions 
formalizes that the user-provided annotations encode time and that the protocol 
is communication-closed w.r.t. this definition of time. In the example, the user 
provides (ballot, label) for local time and msg->bal and msg->lab for times- 
tamps. In Fig. 3, we give example assertions that we add for the send and receive 
conditions (i) and (ii). These assertions only consider the local state, i.e., we do 
not need to capture the states of other processes or the message pool. We check 
the assertions with the static verifier Verifast [22]. 


Synchronous Semantics. Central to our approach is re-writing communication- 
closed asynchronous protocol into synchronous ones. To formalize synchronous 
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semantics we introduce multi Heard-Of protocols, mHO for short. An mHO 
computation is structured into a sequence of mHO-rounds that execute syn- 
chronously. Figure 4 is an example of an mHO protocol. It has two mHO-rounds: 
NewBallot and AckBallot. Within a round, SEND functions, resp. UPDATE 
functions, are executed synchronously across all processes. The round number 
r is initially 0 and it is incremented after each execution of an mHO-round. 
The interesting feature, which models faults and timeouts, are the heard-of sets 
HO [9]. For each round r and each process p, the set HO (p, r) contains the set of 
processes from which p hears of in round r, i.e., whose messages are in the mail- 
box set taken as parameter by UPDATE (mbox). If the message from q to p is lost 
in round r, then q ¢ HO(p,r). Figures 1(b) and 2(b) are examples of executions 
of the protocol in Fig. 4. We extend the HO model [9] by allowing composition 
of multiple protocols. Verification in synchronous semantics, and thus in mHO, 
is simpler due to the round structure, which entails (i) no interleavings, (ii) no 
message buffers, and (iii) simpler invariants at the round boundaries. 


log = NULL; ballot = 0; | | 
y AckBallot Round: 
SEND(){ 
if ((old_mbox1!=0 && old_mbox1->size==1 
&& leader!=me) || leader == me){ 
msg m = new msg(leader); 
send(m,*);}} 
UPDATE (mbox: list(msg)){ 
if((old_mbox1!=0 && old_mbox1->size==1 
&& leader!=me) || leader == me) 
if (mbox!=0 && mbox->size>n/2 
&& all_same(mbox, leader)){ 
add(log, new(phase, leader)); 
out (phase, leader) ; }} 


S 
NewBallot Round: 


SEND() { 

if(coord() == me){ 
msg m = new msg(me); 
send(m,*) ;}} 

UPDATE (mbox: list (msg) ){ 
old_mbox1 = mbox; — 
if(coord() != me){ 
if (mbox!=0 && mbox->size==1) 

leader = mbox->message->sender ; }} 
else leader = me; 


Fig. 4. Control flow graph of synchronous leader election. (Color figure online) 


Rewriting to mHO. We introduce a procedure that takes as input the asyn- 
chronous protocol together with tag annotations that have been checked, and 
produces the protocol rewritten in mHO, e.g., Fig. 3 is rewritten into Fig. 4. The 
rewriting is based on the idea of matching abstract time (ballot, label) to mHO 
round numbers r. Roughly, mHO-round NewBallot is obtained by combining the 
code of the first box on each path in Fig.3 (the red boxes) and AckBallot is 
obtained my combining the second box on each path (the blue ones) as follows. 
The three message reception loops (the code in the boxes with highlighted back- 
ground) are removed, because receptions are implicit in mHO; they correspond 
to a non-deterministic parameter of the UPDATE function. For each round, we 
record the context in which it is executed, e.g., the lower box for the follower is 
executed only if a NewBallot message was received (more details in Sect. 6). 
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Verification. The specification of the running example is that if two processes 
find the leader election for a ballot b successful (i.e., there is log entry for b), then 
they agree on the leader. In general, to prove the specification, we need invariants 
that quantify over the ballot number b. As processes decide asynchronously, the 
proof of ballot 1, for some process p, must refer to the first entry of log of 
processes that might already be in ballot 400. As discussed in [38], in general 
invariants need to capture the complete message history and the complete local 
state of processes. The proof of the same property for the synchronous protocol 
requires no such invariant. Due to communication closure, no messages need to 
be maintained after a round terminated, that is, there is no message pool. The 
rewritten synchronous code has a simpler correctness proof, independent of the 
chosen verification method. One could use model checking [1, 29,39, 40], theorem 
prover approaches [8,11], or deductive verification [14] for synchronous systems. 

For several protocols, we formalize their specification in Consensus Logic [13], 
we have computed the equivalent mHO protocol, and proved it correct using the 
existing deductive verification engine from [13]. 


2 Asynchronous Protocols 


All processes execute the same code, written in the core language in Fig. 5. The 
communication between processes is done via typed messages. Message payloads, 
denoted M, are wrappers of primitive or composite type. We denote by M the set 
of message types. Wrappers are used to distinguish payload types. Send instruc- 
tions take as input an object of some payload type and the receivers identity 
or x corresponding to a send to all. Receives statements are non-blocking, and 
return an object of payload type or NULL. Receive statements are parameterized 
by conditions (i.e., pointers to function) on the values in the received messages 
(e.g., timestamp). At most one message is received at a time. If no message has 
been delivered or satisfies the condition, receive returns NULL. In Fig. 3, we give 
the definition of the function eq, used to filter messages acknowledging the lead- 
ership of a process. The followers use also geq that checks if the received message 
is timestamped with a value higher or equal to the local time. We assume that 
each loop contains at least one send or receive statement. The iterative sequen- 
tial computations are done in local functions, i.e., £(é@). The instructions inQ 
and out() are used to communicate with an external environment. 

The semantics of a protocol P is the asynchronous parallel composition of n 
copies of the same code, one copy per process, where n is a parameter. Formally, 
the state of a protocol P is a tuple (s, msg) where: s € [P — (VarsU{pc}) > D] 
is a valuation in some data domain D of the variables in P, where pc is represents 
the current control location, where Loc is the set of all protocol locations, and 
msg C Une (P x DM) x P) is the multiset of messages in transit (the network 
may lose and reorder messages). Given a process p € P, s(p) is the local state 
of p, which is a valuation of p’s local variables, i.e., s(p) € Varsp U {pc,} > D. 
The state of a crashed process is a wildcard state that matches any state. The 
messages sent by a process are added to the global pool of messages msg, and 
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e:=c constant S:=x:=e assignment 
| x variable reset timeout(e) reset a timeout 
| (2) operation send(m,p) | send(m, *) send message 
pæ = Pid proces Ta m := recv(*cond) receive message 
M load dpe 8;8 sequence 
` P me if e then S else S 
p: Pid, m:M ‘ 
while true S 
Mbox: set ofM ' 
break | continue 
P := Ip:p[S]p protocol x = in() client entry 
P is the set of process identities out(e) client output 


Fig. 5. Syntax of asynchronous protocols. 


a receive statement removes a messages from the pool. The interface operations 
in and out do not modify the local state of a process. An execution is an 
infinite sequence s0 AO s1 Al... such that Vi > 0, si is a protocol state, 
Ai € A is a local statement, whose execution creates a transition of the form 


(s, msg) Ag (s',msg') where {I,O} are the observable events generated by the 
Ai (if any). We denote by [P] the set of executions of the protocol P. 


3 Round-Based Model: mHO 


Intra-procedural. mHO captures round-based distributed algorithms and is a 
reformulation of the model in [9]. All processes execute the same code and the 
computation is structured in rounds. We denote by P the set of processes and 
n = |P| is a parameter. The central concept is the HO-set, where HO(p,r) 
contains the processes from which process p has heard of — has received messages 
from — in round r; this models faults and timeouts. 


Syntax. An mHO protocol con- protocol ::= interface var_decl* init phase 

sists of variable declarations, Vars interface ::= in: () > type | out: type > () 

is the set of variables, an initializa- init ::= init: () > [P > Vars > D] 
phase ::= round 


tion method init, and a non-empty 
sequence of rounds, called phase; cf. 
Fig. 6. A phase is a fixed-size array of 
rounds. Each round has a send and 
update method, parameterized by a 
type M (denoted by roundy) which 
represents the message payload. The method SEND has no side effects and returns 
the messages to be sent based on the local state of each sender; it returns a par- 
tial map from receivers to payloads. The method UPDATE takes as input the 
received messages and updates the local state of a process. It may communi- 
cate with an external client via in and out. For data computations, UPDATE 
uses iterative control structures only indirectly via sequential functions, e.g., 
all_same(mbox, leader) in Fig. 3, which checks whether the payloads of all mes- 
sages in mbox are equal to the local leader estimate. 


roundy ::= SEND: [P > Vars] > [P > T] 
UPDATE: [P — T] x [P > Vars] 
— [P > Vars] 


Fig. 6. mHO syntax. 
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Semantics. The set of executions of a mHO protocol is defined by the execution 
in a loop, of SEND followed by UPDATE for each round in the phase array. The 
initial configuration is defined by init. There are three predefined execution 
counters: the phase number, which is increased after a phase has been executed, 
the step number which tracks which mHO-round is executed in the current phase, 
and the round number which counts the total number of rounds executed so far 
and is defined by the phase times the length of the phase array, plus the step. 
A protocol state is a tuple (SU, s,r, msg, P, HO) where: P is the set of pro- 
cesses, SU € {SEND, UPDATE} indicates the next transition, s € [P — Vars > 
D] stores the process local states, r € N is the round number, msg C 2(P,DM),P) 
stores the in-transit messages, where M is the type of the message payload, 
HO € [P — 2”] evaluates the HO-sets for the current round. After the ini- 
tialization, an execution alternates SEND and UPDATE transitions. In the SEND 
transition, all processes send messages, which are added to a pool of messages 
msg, without modifying the local states. The values of the HO sets are updated 
non-deterministically to be a subset of P. A message is lost if the sender’s iden- 
tity does not belong to the HO set of the receiver. In an UPDATE transition, 
UPDATE is applied at each process, taking as input the set of received messages 
by that process in that round. If the processes communicate with an external 
process, then UPDATE might produce observable events op. These events corre- 
spond to calls to in, which returns an input value, and out that sends the value 
given as parameter to the client. At the end of the round, msg is purged and r 
is incremented. Figure 1(b) shows an execution of the mHO algorithm in Fig. 4. 


Inter-procedural. The model introduced so far allows to express one protocol, 
e.g., a leader election protocol (e.g., Fig. 4). However, realistic systems typically 
combine several protocols, e.g., we can transform Fig. 4 into a replicated state 
machine protocol, by allowing processes to enter an atomic broadcast protocol 
in every ballot where a leader is elected successfully. Figure 7 sketches such an 
execution, where in the update of round AckBallot, a subprotocol is called; its 
execution is sketched with thicker edges. In the subprotocol, the leader broad- 
casts client requests in a loop until it loses its quorum. When a follower does 
not receive a message from the leader, it considers the leader crashed, and the 
control returns to the leader election protocol. 

An inter-procedural mHO protocol dif- 
fers from an intra-procedural one only in 
the UPDATE function: It may call another 
protocol and block until the call returns. 
An UPDATE may call at most one pro- 


Ballot i 


AckBallot : 


tocol on each path in its control flow ““°* call brotocol return from protocol 
(a sequence of calls can be implemented 
using multiple rounds). Thus, an inter- 
procedural mHO protocol is a collection of non-recursive mHO protocols, with a 
main protocol as entry point. Different protocols exchange messages of different 
types. 


Fig. 7. Inter-procedural execution 
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4 Formalizing Communication Closure Using Tags 


We introduce synchronization tags which are program annotations that define 
communication-closed rounds within an asynchronous protocol. 


Definition 1 (Tag annotation). For a protocol P, a tag annotation is a tuple 
(SyncV, tags, tagm, <, D) where: 


- D = (Dı, Do,...,Dam—1, Dam), with (Di, Si, Li) an ordered domain with a 
minimal element, denoted L;, for 1 < i < 2m. The cardinality of Do; is 
bounded and all Də; are pairwise disjoint, for i € [1,m]. 

— relation < is the lexicographical order: the ith component is ordered by <;, 

— SyncV = (v1, V2,..-,V2m—1,V2m) is a tuple of fresh variables, 

- tags : Loc > [Syncv ““ vars] annotates each control location with a partially 
defined injective function, that maps SyncV over protocol variables, 


- tagm: M — [SyncV an Fields(M)| annotates each message type M E€ M with 
a partially defined injective function, that maps SyncV over the fields of M. 


The evaluation of a tag over P’s semantics is denoted ([tags], [tagm]), where 


— [tags] : & — [SyncV — D] is defined over the set of local process states 
y= User] Unep s(p), such that [tags], = (di,...,djgyncyj) with di = [a], if 
x = tags([pc]s)(vi) € Vars otherwise di = Li, where s € X, æ € Vars, v; is 
the i” component in SyncV, and pc is the program counter; 

- [tagn] : Oyen PM) > [SyncV —> DU 1] is a function that for any message 
value m = (m4,...,mz), in the domain of some message type M, associates a 
tuple [tagm] m:n = (di,..-,d)syncv|) with di = mj; if j = tagm(M)(v;) otherwise 
di = Li, where v; is the ° element in Syncv. 


For every 1 <i < m, va; is called a phase tag and vo; is called step tag. 
Given an execution n € [P], a transition sAs' in n is tagged by [tagm]m if 


A is send(m) or m = recu(*cond), or A is tagged by [tags], otherwise. 


For Fig.3, SyncV = (v1,v2), and tags matches vı and v2 with ballot 
and label, resp., at all control locations, i.e., a process is in step NewBallot 
of phase 3, when ballot = 3 and label = NewBallot. For the type msg, 
tagm matches the field ballot and lab with vı and vg, resp., i.e., a message 
(3, NewBallot,5) is a phase 3 step NewBallot message. To capture that mes- 
sages of type A are sent locally before messages of type B, the tagging function 
tagm(B) should be defined on the same synchronization variables as tagm(A). 


Definition 2 (Synchronization tag). Given a protocol P, an annotation tag 
(SyncV, tags, tagm, D, <) is called synchronization tag iff: 


(I.) for any local execution t = soAos14A1ı ... € [P]p of a process p, the sequence 
[tags], [tags], [tags]. ... is a monotonically increasing w.r.t. <. 
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Moreover Vj, j) € [l.m], j < 7’. if [tags] I7) # [tags] 2P and 
[tags] 1) 4 [tags] oy) then [tags] $41" = (day-day) 
where [tags] 2-17) is the projection of the tuple [tags],, on the 27 — 1 
and 2j components, 


(II.) for any local execution m € [P]p, if s s' is a transition of 7, with 
m a message value, then [tags], = [tagm]m and [tags], = [tags] s, 


send(m,-) 


m=recu (cond) 


(III.) for any local execution t € [P]p, if s 
with m a value of some message type, then 
- if m # NULL then [tags], < [tagm]m, [tags], = [tags].,, and 
— if m = NULL then s = sr, 
(IV.) for any local execution n € [P]p, if s s,s! is a transition of 7 such that 


sr is a transition of T, 


- s $ s' and s | syncv= 5’ |msyncv, that is, s and s’ differ on the variables 
that are neither of some message type nor in the image of tags, 

— or stm is a send, break, continue, or out(), 
then for all message type variables m in the protocol, [tags], = [tagm]m, 
where m is the value in the state s ofm, and for any Mbox variables of type 
set of messages, [tags], = [tagm],, with m € [Mbox], 


send(m,_) stmt send(m’,_) 


(V.) for any local execution m € [P]p, if sı 52 53 S4 


m=recu(*cond) stmt send(m’,_) eet 
or sı S2 —> 83 s4 are sequences of transitions 


in m, then [tagm]m < [tagm]m’, where stm is any statement except send 


: m=recu(*cond) stmt m'=recv(*cond’) P 
or recv. Moreover, if sı > S52 — S3 > S4 MT, 


then s2 |vars\ (MUSyncv) = 53 |vars\ MUsSyncv) or [tags]s, 5 [tags]s;- 


A protocol P is communication-closed, if there exists a synchronization tag for P. 


Condition (I.) states that SyncV is not decreased by any local statement (it is 
a notion of time). Further, one synchronization pair is modified at a time, except 
a reset (i.e., a pair is set to its minimal value) when the value of a preceding 
pair is updated. Checking this, translates into checking a transition invariant, 
stating that the value of the synchronization tuple SyncV is increased by any 
assignment. To state this invariant we introduce “old synchronization variables” 
that maintain the value of the synchronization variables before the update. 

Condition (II.) states that any message sent is tagged with a timestamp that 
equals the current local time. Checking it, reduces to an assert statement that 
expresses that for every v € SyncV, tagm(M)(v) = tags(pc)(v), where M is the 
type of the message m which is sent, and pc is the program location of the send. 

Condition (III.) states that any message received is tagged with a timestamp 
greater than or equal to the current time of the process. To check it, we need 
to consider the implementation of the functions passed as argument to a recv 
statement. These functions (e.g., eq and geq in Fig. 3) implement the filtering of 
the messages delivered by the network. We inline their code and prove Condition 
(III.) by comparing the tagged fields of message variables with the phase and 
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step variables. In Fig.3, assert m— bal == ballot && m— lab == label 
after recv(eq(ballot, label)) checks this condition on the leader’s branch. 

Condition (IV.) states that if the local state of a process changes (except 
changes of message type variables and synchronization variables), then all locally 
stored messages are timestamped with the current local time. That is, future 
messages cannot be “used” (no variable can be written, except message type 
variables) before the phase and step tags are updated to match the highest 
timestamp. To check it, we need to prove a stronger property than the one 
for (III.). At each control location that writes to either variables of primitive or 
composite type or mailbox variables, the values of the phase (and step) variables 
must be equal to the phase (and step) tagged fields of all allocated message type 
objects. In Fig. 3, the statement assert (equal (mbox, ballot, label)) checks this 
condition on the leader’s branch. It is a separation logic formula that uses the 
inductive list definition of mbox which includes the content of the mbox. 

The first four conditions imply that there is a global notion of time in the 
asynchronous protocol. However, this does not restrict the number of the mes- 
sages exchanged between two processes with the same timestamp. mHO restricts 
the message exchange: for every time value (corresponding to a mHO-round), 
processes first send, then they receive messages, and then they perform a com- 
putation without receiving or sending more messages before time is increased. 
Condition (V.) ensures that the asynchronous protocol has this structure. We 
do a syntactic check of the code to ensure the code meets these restrictions. 

Intuitively, each pair of synchronization variables identifies uniquely a mHO- 
protocol. To rewrite an asynchronous protocol into nested (inter-procedural) 
mHO-protocols, the tag of the inner protocol should include the tag of the outer 
one. The asynchronous code advances the time of one protocol at a time, that is, 
modifies one synchronization pair at a time. The only exception is when inner 
protocols terminate: in this case, the time of the outer protocol is advanced, 
while the time of the inner one is reset. Moreover, different protocols exchange 
different message types. To be able to order the messages exchanged by an inner 
protocol w.r.t. the messages exchanged by an outer protocol, the inner protocol 
messages should be tagged also with the synchronization variables identifying the 
outer one. This is actually happening in state machine replication algorithms, 
where the ballot (or view number), which is the tag of the outer leader election 
algorithm, tags also all the messages broadcast by the leader in the inner one. 


5 Reducing Asynchronous Executions 


We show that any execution of an asynchronous protocol that has a synchro- 
nization tag can be reduced to an indistinguishable mHO execution. 


Definition 3 (Indistinguishability). Given two executions x and x’ of a pro- 
tocol P, we say a process p cannot distinguish locally between n and n’ w.r.t. a 
set of variables W, denoted a a n', if the projection of both executions on 
the sequence of states of p, restricted to the variables in W, agree up to finite 
stuttering, denoted, 7|p,w= T'\p, w. 
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Two executions x and x’ are indistinguishable w.r.t. a set of variables W, 


denoted r ~™ x’, iff no process can distinguish between them, i.e., Vp. 7 ~W x’. 


=p 
The reduction preserves so-called local properties [7], among which are con- 
sensus and state machine replication. 


Definition 4 (Local properties). A property ¢ is local if for any two execu- 
tions a and b that are indistinguishable a = ¢ iff bE ¢. 


Theorem 1. If there exists a synchronization tag (SyncV, tags, tagm, D, <) for 
P, then Vae € [P] there exists an mHO-execution se that is indistinguishable 
w.r.t. all variables except for M or Set(M) variables, therefore ae and se satisfy 
the same local properties. 


Proof Sketch. There are two cases to consider. Case (1): every receive transi- 


= d 
tion s Z, sr in ae satisfies that [tags],, = [tagm],,, i.e., all mes- 


sages received are timestamped with the current local tag of the receiver. We 
use commutativity arguments to reorder transitions so that we obtain an indis- 
tinguishable asynchronous execution in which the transition tags are globally 
non-decreasing: The interesting case is if a send comes before a lower tagged 
receive in ae. Then the tags of the two transitions imply that the transitions 
concern different messages so that swapping them cannot violate send/receive 
causality. 

We exploit that in the protocols we consider, no correct process locally keeps 
the tags unchanged forever (e.g., stays in a ballot forever) to arrive at an execu- 
tion where the subsequence of transitions with the same tag is finite. Still, the 
resulting execution is not an mHO execution; e.g., for the same tag a receive 
may happen before a send on a different process. Condition (V.) ensures that 
mHO send-receive-update order is respected locally at each process. From this, 
together with the observation that sends are left movers, and updates are right 
movers, we obtain a global send-receive-update order which implies that the 


resulting execution is a mHO execution. 


= * d 
Case (2): there is a transition s — recv(scond) sr in ae such that [tags], < 


[tagm],,, that is, a process receives a message with tag k’, higher than its state 
tag k. In mHO, a process only receives for its current round. To bring the asyn- 
chronous execution in such a form, we use Condition (IV.) and mHO semantics, 
where each process goes through all rounds. First, Condition (IV.) ensures that 
the process must update the tag variables to k’ at some point t after receiving it, 
if it wants to use the content of the message. It ensures that the process stutters 
during the time instance between k and k’, w.r.t. the values of the variables 
which are not of message type. That is, for the intermediate values of abstract 
time, between k and k’, no messages are sent, received, and no computation 
is performed. We split ae at point t and add empty send instructions, receive 
instructions, and instructions that increment the synchronization variables, until 
the tag reaches k’. If we do this for each jump in ae, we arrive at an indistin- 
guishable asynchronous execution that falls into the Case (1). 
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6 Rewriting of Asynchronous to mHO 


We introduce a rewriting algorithm that takes as input an asynchronous protocol 
P annotated with a synchronization tag and produces a mHO protocol whose 
executions are indistinguishable from the executions of P. 


Message Reception. mHO receives all messages of a round at once, while in 
the asynchronous code, messages are received one by one. By Condition (V.), 
receive steps that belong to the same round are separated only by instructions 
that store the messages in the mailbox. We consider that message reception 
is implemented in a simple while(true) loop (the most inner one); cf. filled 
boxes in Fig. 3. Conditions (III.) and (IV.) ensure that all messages received in 
a loop belong to one round (the current one or the one the code will jump to 
after exiting the reception loop). Thus, we replace a reception loop by havoc 
and assume statements that subsume the possible effects of the loop, satisfying 
all the conditions regarding synchronization tags found in the original receive 
statements. 


Rewriting to an Intra-proceduralmHO. When the synchronization tag is defined 
over a pair of variables, the rewriting will produce an intra-procedural mHO 
protocol. Recall that the values of synchronization variables incarnate the round 
number, so that each update to a pair of synchronization variables marks the 
beginning of a new mHO round. The difficulty is that different execution prefixes 
may lead to the same values of the synchronization variables. To compute mHO- 
rounds, the algorithm exploits the position of the updates to the synchronization 
variables in the control flow graph (CFG). We consider different CFG patterns, 
from the simplest to the most complicated one. 


(o) 


ph++ 
i st:<A | < m pht++ 
: Z phys if (jump==Start) 
y : al T aa 
Yy 
if (b) F peciteiteiseeeseeseeseed 
fa ene Y : | havoc(m) 
pai ph:=m->ph havoc(m) 200" stm 
stm : : 
st:=B st:F : SEM y j 
: v V jump := , 
; Start Jump:= 
y K -o End 


Fig. 8. Control flow graphs for rewriting. (Color figure online) 


Case 1: If the CFG is like in Fig. 8(a), i.e., it consists of one loop, where the 
phase tag ph is incremented once at the beginning of each loop iteration, and for 
every value of the step tag st there is exactly one assignment in the loop body 
(the same on all paths). In this case, the phase tag takes the same values as the 
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loop iteration counter (maybe shifted with some initial value). Therefore, the 
loop body defines the code of an mHO-phase. It is easy to structure it into two 
mHO-rounds: the code of round A is the part of the CFG from the beginning of 
the loop’s body up to the second assignment of the st variable, and round B is 
the rest of the code up to the end of the loop body. 


Case 2: The CFG is like in Fig. 8(b). It differs from Case 1 in that the same 
value is assigned to st in different branches. Each of this assignments marks the 
beginning of a mHO round B, which thus has multiple entry points. In mHO, a 
round only has one entry point. To simulate the multiple entry points in mHO, 
we store in auxiliary variables the values of the conditions along the paths that 
led to the entry point. In the figure, the code of round A is given by the red box, 
and the code of round B by the condition in the first blue box, expressed on the 
auxiliary variable, followed by the respective branches in the blue box. 

In our example in Fig. 3, the assignment label = AckBallot appears in the 
leader and the follower branch. Followers send and receive AckBallot messages 
only if they have received a NewBallot. The rewrite introduces old-mbox1 in the 
mHO protocol in Fig. 4 to store this information. Also, we eliminate the variables 
ballot and label; they are subsumed by the phase and round number of mHO. 


Case 3: Let us assume that the CFG is like in Fig. 8(c). It differs from Case 1 
because the phase tag ph is assigned twice. We rewrite it into asynchronous code 
that falls into Case 1 or 2. The resulting CFG is sketched in Fig. 8(d), with only 
one assignment to ph at the beginning of the loop. 

If the second assignment changes the value of ph, then there is a jump. In 
case of a jump, the beginning of a new phase does not coincide with the first 
instruction of the loop. Thus there might be multiple entry points for a phase. We 
introduce (non-deterministic) branching in the control flow to capture different 
entry points: In case there is no jump, the green followed by the purple edge are 
executed within the same phase. In case of a jump, the rewritten code allows the 
green and the purple paths to be executed in different phases; first the green, 
and then the purple in a later phase. We add empty loops to simulate the phases 
that are jumped over. As a pure non-deterministic choice at the top of the loop 
would be too imprecise, we use the variable jump to make sure that the purple 
edge is executed only once prior to green edge. In case of multiple assignments, 
we perform this transformation iteratively for each assignment. 

The protocol in Fig. 4 is obtained using two optimizations of the previous 
construction: First we do not need empty loops. They are subsumed by the 
mHO semantics as all local state changes are caused by some message reception. 
Thus, an empty loop is simulated by the execution of a phase with empty HO 
sets. Second, instead of adding jump variables, we reuse the non-deterministic 
value of mbox. This is possible as the jump is preconditioned by a cardinality 
constraint on the mbox, and the green edge is empty (assignments to ballot 
and label correspond to ph++ and reception loops have been reduced to havoc 
statements). 
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Nesting. Cases 1-3 capture loops without nesting. Nested loops are rewritten 
into inter-procedural mHO protocols, using the structure of the tag annotations 
from Sect.4. Each loop is rewritten into one protocol, starting with the most 
inner loop using the procedure above. For each outer loop, it first replaces the 
nested loop with a call to the computed mHO protocol, and then applies the same 
rewriting procedure. Interpreting each loop as a protocol is pessimistic, and our 
rewriting may generate deeper nesting than necessary. Inner loops appearing on 
different branches may belong to the same sub-protocol, so that these different 
loops exchange messages. If tags associates different synchronization variables 
to different loops then the rewriting builds one (sub-)protocol for each loop. 
Otherwise, the rewriting merges the loops into one mHO protocol. To soundly 
merge several loops into the same mHO protocol, the rewrite algorithm identifies 
the context in which the inner loop is executed. 


Theorem 2. Given an asynchronous protocol P annotated with a synchroniza- 
tion tag (SyncV, tags, tagm, D, <), the rewriting returns an inter-procedural mHO 
protocol P™ 1? whose executions are indistinguishable from the executions of P. 


7 Experimental Results 


We implemented the rewriting procedure in a prototype tool ATHOS (https:// 
github.com/alexandrumc/async-to-sync-translation). We applied it to several 
fault-tolerant distributed protocols. Figure 9 summarizes our results. 


Verification of Synchronization Tags. The tool takes protocols in a C embedding 
of the language from Sect.2 as input. We use a C embedding to be able to 
use Verifast [22] for checking the conditions in Sect. 4, i.e., the communication 
closure of an asynchronous protocol. Verifast is a deductive verification tool 
based on separation logic for sequential programs. Therefore, communication 
closure is specified in separation logic in our tool. To reason about sending and 
receiving messages, we inline every recv(xcond) and use predefined specifications 
for send and recv. We consider only the prototype and the specification of these 
functions. 

The user specifies in a configuration file the synchronization tag by (i) defin- 
ing the number of (nested) protocols, (ii) for each protocol, the phase and step 
variables, and (iii) for each messages type the fields that encode the timestamp, 
i.e., the phase and step number. Figure9 gives the names of phase and step 
variables of our benchmarks. For now, we manually insert the specification to 
be proven, i.e., the assert statements that capture Conditions (I.) to (V.) in 
Sect. 4. In Fig. 9, column Async gives the size in LoC of the input asynchronous 
protocol, +CC gives the size in LoC of the input annotated with the checks for 
communication closure (Conditions (I.) to (V.)) and their proofs. 
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Protocol Tags Async|+CC|Sync 
Consensus ph = rp 
[6, Fig.6] st= {Phasel, Phase2, Phase3, Phase4} 9347 nad a 
., || ph =i, 
Two phase commit st= {Query, Vote, Commit, Ack} 342 | 596 | 242 
; ay ph = ballot, 
Peete st = {NewBallot, AckBallot} 235 | 810 H0 
phl = view, 
ViewChange* [34] || stl = {StartViewChange, 352 | 720 | 172 
DoViewChange, StartView} 
NormalkOp“ jaa |= ee 266 | 628 | 182 


st = {Prepare, PrepareOK, Commit} 
phl = ballot, 

stl = {NewBallot, AckBallot, NewLog} 
ph2 = op_ number, 

st2 = {Prepare, PrepareOkK, Commit} 


Multi-Paxos*"” [25] 1646 | 621 | 405 


Fig. 9. Benchmarks. The superscript * identifies protocols that jump over phases. The 
superscript V marks protocols whose synchronous counterpart we verified. 


Benchmarks. Our tool has rewritten several challenging benchmarks: the algo- 
rithm from [6, Fig. 6] solves consensus using a failure detector. The algorithm 
jumps to a specific decision round, if a special decision message is received. Multi- 
Paxos is the Paxos algorithm from [25] over sequences, without fast paths, where 
the classic path is repeated as long as the leader is stable. Roughly, it does a 
leader election similar to our running example (NewBallot is Phasela), except 
that the last all-to-all round is replaced by one back-and-forth communication 
between the leader and its quorum: the leader receives n/2 acknowledgments that 
contain also the log of its followers (Phase1b). The leader computes the maximal 
log and sends it to all (Phase1aStart). In a subprotocol, a stable leader accepts 
client requests, and broadcasts them one by one to its followers. The broadcast 
is implemented by three rounds, Phase2aClassic, Phase2bClassic, Learn, and is 
repeated as long as the leader is stable. ViewChange is a leader election algo- 
rithm similar to the one in ViewStamped [34]. Normal-Op is the subprotocol 
used in ViewStamped to implement the broadcasting of new commands by a 
stable leader. The last column of Fig. 9 gives the size of the mHO protocol com- 
puted by the rewriting. The implementation uses pycparser [3], to obtain the 
abstract syntax tree of the input protocol. 


Verification. We verified the safety specification (agreement) of the mHO counter- 
parts of the running example (Fig. 3), Normal-Op, and Multi-Paxos, by deductive 
verification: We encoded the specification of these algorithms, i.e., atomic broad- 
cast, consensus, leader election, and the transition relation in Consensus Logic 
CL [13]. CL is a specification logic that allows us to express global properties of 
synchronous systems, and it contains expressions for processes, values, sets, cardi- 
nalities, and set comprehension. The verification conditions are soundly discarded 
by using an SMT solver. We used Z3 [33] in our experiments. 
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For Multi-Paxos we did a modular proof. First we prove the correctness 
of the sub-protocol Normal-Op which implements a loop of atomic broadcasts 
(executed in case of a stable leader). Then we prove the leader election outer 
loop correct, by replacing the subprotocol Normal-Op with its specification. 


8 Related Work and Conclusions 


Verification of asynchronous protocols received a lot of attention in the past 
years. Mechanized verification techniques like IronFleet [21] and Verdi [41] were 
the first to address verification of state machine replication. Later, Disel [38] 
proposes a logic to make the reasoning less protocol-specific, with the tradeoff 
of proofs that use the entire message history. At the other end of the spectrum, 
model checking based techniques [2,4,20,23,24] are fully automated but more 
restricted regarding the protocols they apply to. In between, semi-automated 
verification techniques based on deductive verification like natural proofs [12], 
Ivy [36], and PSync [14] try to minimize the user input for similar benchmarks. 

We propose a technique that reduces the verification of an asynchronous pro- 
tocol to a synchronous one, which simplifies the verification task no matter which 
method is chosen. We verified the resulting synchronous protocols with deduc- 
tive verification based on [14]. Our technique uses the notion of communication 
closure [17], which we believe is the essence of any explicit or implicit synchrony 
in the system. We formalized a more general notion of communication closure 
that allows jumping over rounds, which is a catch-up mechanism essential to re- 
synchronize and ensure liveness. Previous reduction techniques focus on shared 
memory systems [16,27], in contrast we focus on message passing concurrency. 

The closest approaches are the results in [4,24] and [2,20], which also explore 
the synchrony of the system. Compared to these approaches, our technique allows 
more general behaviors, e.g., reasoning about stable leaders is possible because 
communication closure includes (for the first time) unbounded jumps. Also, we 
reduce to a stronger synchronous model, a round-based one instead of a peer to 
peer one, where interleavings w.r.t. actions of other rounds are removed. 

As future work, we will address the relation between communication closure 
and specific network assumptions, e.g., FIFO channels, and a current limita- 
tion of communication closure which is reacting on messages from the past. For 
instance, recovery protocols react to such messages. 
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Abstract. The principle of strong induction, also known as k-induction 
is one of the first techniques for unbounded SAT-based Model Checking 
(SMC). While elegant and simple to apply, properties as such are rarely 
k-inductive and when they can be strengthened, there is no effective 
strategy to guess the depth of induction. It has been mostly displaced 
by techniques that compute inductive strengthenings based on interpola- 
tion and property directed reachability (PDR). In this paper, we present 
KAvy, an SMC algorithm that effectively uses k-induction to guide inter- 
polation and PpR-style inductive generalization. Unlike pure k-induction, 
KAVY uses Ppr-style generalization to compute and strengthen an induc- 
tive trace. Unlike pure PDR, KAvy uses relative k-induction to construct 
an inductive invariant. The depth of induction is adjusted dynamically 
by minimizing a proof of unsatisfiability. We have implemented KAVY 
within the Avy Model Checker and evaluated it on HWMCC instances. 
Our results show that KAvy is more effective than both Avy and PDR, 
and that using k-induction leads to faster running time and solving more 
instances. Further, on a class of benchmarks, called shift, KAVY is orders 
of magnitude faster than Avy, PDR and k-induction. 


1 Introduction 


The principle of strong induction, also known as k-induction, is a generalization 
of (simple) induction that extends the base- and inductive-cases to k steps of a 
transition system [27]. A safety property P is k-inductive in a transition system 
T iff (a) P is true in the first (k — 1) steps of T, and (b) if P is assumed to hold 
for (k — 1) consecutive steps, then P holds in k steps of T. Simple induction 
is equivalent to 1-induction. Unlike induction, strong induction is complete for 
safety properties: a property P is safe in a transition system T iff there exists a 
natural number k such that P is k-inductive in T (assuming the usual restriction 
to simple paths). This makes k-induction a powerful method for unbounded SAT- 
based Model Checking (SMC). 

Unlike other SMC techniques, strong induction reduces model checking to 
pure SAT that does not require any additional features such as solving with 
assumptions [12], interpolation [24], resolution proofs [17], Maximal Unsatis- 
fiable Subsets (MUS) [2], etc. It easily integrates with existing SAT-solvers 
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and immediately benefits from any improvements in heuristics [22,23], pre- 
and in-processing [18], and parallel solving [1]. The simplicity of applying k- 
induction made it the go-to technique for SMT-based infinite-state model check- 
ing [9,11,19]. In that context, it is particularly effective in combination with 
invariant synthesis [14,20]. Moreover, for some theories, strong induction is 
strictly stronger than 1-induction [19]: there are properties that are k-inductive, 
but have no 1-inductive strengthening. 

Notwithstanding all of its advantages, strong induction has been mostly 
displaced by more recent SMC techniques such as Interpolation [25], Prop- 
erty Directed Reachability [3,7,13,15], and their combinations [29]. In SMC k- 
induction is equivalent to induction: any k-inductive property P can be strength- 
ened to an inductive property Q [6,16]. Even though in the worst case Q is 
exponentially larger than P [6], this is rarely observed in practice [26]. Further- 
more, the SAT queries get very hard as k increases and usually succeed only 
for rather small values of k. A recent work [16] shows that strong induction can 
be integrated in PDR. However, [16] argues that k-induction is hard to control 
in the context of PDR since choosing a proper value of k is difficult. A wrong 
choice leads to a form of state enumeration. In [16], k is fixed to 5, and regular 
induction is used as soon as 5-induction fails. 

In this paper, we present KAvy, an SMC algorithm that effectively uses 
k-induction to guide interpolation and PpR-style inductive generalization. As 
many state-of-the-art SMC algorithms, KAvy iteratively constructs candidate 
inductive invariants for a given safety property P. However, the construction of 
these candidates is driven by k-induction. Whenever P is known to hold up to a 
bound N, KAvy searches for the smallest k < N +1, such that either P or some 
of its strengthening is k-inductive. Once it finds the right k and strengthening, 
it computes a l-inductive strengthening. 

It is convenient to think of modern SMC algorithms (e.g., PDR and Avy), and 
k-induction, as two ends of a spectrum. On the one end, modern SMC algorithms 
fix k to 1 and search for a 1-inductive strengthening of P. While on the opposite 
end, k-induction fixes the strengthening of P to be P itself and searches for a k 
such that P is k-inductive. KAvY dynamically explores this spectrum, exploiting 
the interplay between finding the right k and finding the right strengthening. 

As an example, consider a system in Fig.l reg [7:0] c = 0; 
that counts upto 64 and resets. The prop- always 


erty, p : c < 66, is 2-inductive. IC3, PDR and ae ea 
Avy iteratively guess a 1-inductive strength- else 


ening of p. In the worst case, they require ena ~ aphasia. 


at least 64 iterations. On the other hand, assert property (c < 66); 
KAvy determines that p is 2-inductive after 
2 iterations, computes a 1-inductive invariant Fig. 1. An example system. 
(c £ 65) A (c < 66), and terminates. 

KAvy builds upon the foundations of Avy [29]. Avy first uses Bounded 
Model Checking [4] (BMC) to prove that the property P holds up to 
bound N. Then, it uses a sequence interpolant [28] and PDR-style inductive- 
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generalization [7] to construct 1-inductive strengthening candidate for P. We 
emphasize that using k-induction to construct 1-inductive candidates allows 
KAvy to efficiently utilize many principles from PDRand Avy. While maintain- 
ing k-inductive candidates might seem attractive (since they may be smaller), 
they are also much harder to generalize effectively [7]. 

We implemented KAvy in the Avy Model Checker, and evaluated it on the 
benchmarks from the Hardware Model Checking Competition (HWMCC). Our 
experiments show that KAvy significantly improves the performance of Avy and 
solves more examples than either of PDR and Avy. For a specific family of exam- 
ples from [21], KAVY exhibits nearly constant time performance, compared to an 
exponential growth of Avy, PDR, and k-induction (see Fig. 2b in Sect. 5). This 
further emphasizes the effectiveness of efficiently integrating strong induction 
into modern SMC. 

The rest of the paper is structured as follows. After describing the most 
relevant related work, we present the necessary background in Sect. 2 and give an 
overview of SAT-based model checking algorithms in Sect.3. KAvy is presented 
in Sect. 4, followed by presentation of results in Sect. 5. Finally, we conclude the 
paper in Sect. 6. 


Related Work. KAvy builds on top of the ideas of IC3 [7] and Ppr [13]. The 
use of interpolation for generating an inductive trace is inspired by Avy [29]. 
While conceptually, our algorithm is similar to Avy, its proof of correctness is 
non-trivial and is significantly different from that of Avy. We are not aware of 
any other work that combines interpolation with strong induction. 

There are two prior attempts enhancing PDR-style algorithms with k- 
induction. PD-KIND [19] is an SMT-based Model Checking algorithm for infinite- 
state systems inspired by IC3/Ppr. It infers k-inductive invariants driven by 
the property whereas KAvy infers 1-inductive invariants driven by k-induction. 
Pp-KIND uses recursive blocking with interpolation and model-based projection 
to block bad states, and k-induction to propagate (push) lemmas to next level. 
While the algorithm is very interesting it is hard to adapt it to SAT-based setting 
(i.e. SMC), and impossible to compare on HWMCC instances directly. 

The closest related work is KIC3 [16]. It modifies the counter example queue 
management strategy in IC3to utilize k-induction during blocking. The main 
limitation is that the value for k must be chosen statically (k = 5 is reported for 
the evaluation). KAvy also utilizes k-induction during blocking but computes 
the value for k dynamically. Unfortunately, the implementation is not available 
publicly and we could not compare with it directly. 


2 Background 


In this section, we present notations and background that is required for the 
description of our algorithm. 
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Safety Verification. A symbolic transition system T is a tuple (v, Init, Tr, Bad), 
where v is a set of Boolean state variables. A state of the system is a complete 
valuation to all variables in @ (i.e., the set of states is {0,1}!°!). We write v = 
{v | v € v} for the set of primed variables, used to represent the next state. Init 
and Bad are formulas over 0 denoting the set of initial states and bad states, 
respectively, and Tr is a formula over u U v’, denoting the transition relation. 
With abuse of notation, we use formulas and the sets of states (or transitions) 
that they represent interchangeably. In addition, we sometimes use a state s to 
denote the formula (cube) that characterizes it. For a formula ọ over v, we use 
y(w’), or y’ in short, to denote the formula in which every occurrence of v € @ is 
replaced by v’ € v’. For simplicity of presentation, we assume that the property 
P =-Bad is true in the initial state, that is Init > P. 

Given a formula (0), an M-to-N-unrolling of T, where y holds in all inter- 
mediate states is defined by the formula: 


N-1 


Triar = AN p) A Tri, Bi41) (1) 
1=M 


We write Tr[y]% when M =0 and Tr}, when y =T. 

A transition system T is UNSAFE iff there exists a state s € Bad s.t. s is 
reachable, and is SAFE otherwise. Equivalently, T is UNSAFE iff there exists a 
number N such that the following unrolling formula is satisfiable: 


Init(õ0) A Tr’ A Bad(dy) (2) 


T is SAFE if no such N exists. Whenever T is UNSAFE and sy € Bad is a 
reachable state, the path from sg € Init to sy is called a counterexample. 
An inductive invariant is a formula Inv that satisfies: 


Init(6) > Inv(o) Inv() A Tr(o, 0’) => Inv(v’) (3) 


A transition system T is SAFE iff there exists an inductive invariant Inv s.t. 
Inv(t) => P(%). In this case we say that Inv is a safe inductive invariant. 

The safety verification problem is to decide whether a transition system T 
is SAFE or UNSAFE, i.e., whether there exists a safe inductive invariant or a 
counterexample. 


Strong Induction. Strong induction (or k-induction) is a generalization of the 
notion of an inductive invariant that is similar to how “simple” induction is 
generalized in mathematics. A formula Inv is k-invariant in a transition system 
T if it is true in the first k steps of T. That is, the following formula is valid: 
Init(dp) A Tr® > (ACS Inv(@)). A formula Inv is a k-inductive invariant iff 
Inv is a (k — 1)-invariant and is inductive after k steps of T, i.e., the following 
formula is valid: Tr[Inv]* > Inv(6,). Compared to simple induction, k-induction 
strengthens the hypothesis in the induction step: Inv is assumed to hold between 
steps 0 to k—1 and is established in step k. Whenever Inv = P, we say that Inv 
is a safe k-inductive invariant. An inductive invariant is a 1-inductive invariant. 
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Theorem 1. Given a transition system T. There exists a safe inductive invari- 
ant w.r.t. T iff there exists a safe k-inductive invariant w.r.t. T. 


Theorem 1 states that k-induction principle is as complete as 1-induction. One 
direction is trivial (since we can take k = 1). The other can be strengthened 
further: for every k-inductive invariant Inv; there exists a l-inductive strength- 
ening Inv; such that Inv; => Invk. Theoretically Inv, might be exponentially 
bigger than Inv; [6]. In practice, both invariants tend to be of similar size. 

We say that a formula ¢ is k-inductive relative to F if it is a (k—1)-invariant 
and Trip A F] > (dz). 


Craig Interpolation [10]. We use an extension of Craig Interpolants to sequences, 
which is common in Model Checking. Let A = [Aj1,..., An] such that A; A---A 
Ay is unsatisfiable. A sequence interpolant I = SEQITP(A) for A is a sequence of 
formulas I = I, iy ., In] such that (a) Ay => Ty, (b) Vl <i<N-IAA;> Tian; 
(c) In A An > L, and (d) J; is over variables that are shared between the 
corresponding prefix and suffix of A. 


3 SAT-Based Model Checking 


In this section, we give a brief overview of SAT-based Model Checking algo- 
rithms: IC3/PbpR [7,13], and Avy [29]. While these algorithms are well-known, 
we give a uniform presentation and establish notation necessary for the rest of 
the paper. We fix a symbolic transition system T = (v, Init, Tr, Bad). 

The main data-structure of these algorithms is a sequence of candidate invari- 
ants, called an inductive trace. An inductive trace, or simply a trace, is a sequence 
of formulas F = [Fo,..., Fy] that satisfy the following two properties: 


Init(0) = Fo(v) YO<i< N-F,(0)A Trt, v) > Fi") (4) 


An element F; of a trace is called a frame. The index of a frame is called a 
level. F is clausal when all its elements are in CNF. For convenience, we view 
a frame as a set of clauses, and assume that a trace is padded with T until the 
required length. The size of F = [Fo,..., Fy] is |F| = N. For k < N, we write 
F® =([F,,..., Fy] for the k-suffix of F. 

A trace F of size N is stronger than a trace G of size M iff VO < i < 
min(N, M) - F(t) > G;(0). A trace is safe if each F; is safe: Vi- F; > Bad; 
monotone if YO < i < N.-F; => Fj41. In a monotone trace, a frame F; over- 
approximates the set of states reachable in up to 7 steps of the Tr. A trace is 


closed if 31 < i < N- F; > (Vi F;). 


We define an unrolling formula of a k-suffix of a trace F = [Fo,..., Fy] as: 
[F| 
Tr|F*] = \ F; (vi) A Tr (Ui, Ui+1) (5) 


i=k 
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We write Tr[F'] to denote an unrolling of a 0-suffix of F (i.e F itself). Intuitively, 
Tr|F*] is satisfiable iff there is a k-step execution of the Tr that is consistent with 
the k-suffix F”. If a transition system T admits a safe trace F of size |F| = N, 
then T does not admit counterexamples of length less than N. A safe trace F, 
with |F| = N is extendable with respect to level 0 < i < N iff there exists a 
safe trace G stronger than F such that |G| > N and F; A Tr > Gi41. G and 
the corresponding level 7 are called an eztension trace and an extension level 
of F, respectively. SAT-based model checking algorithms work by iteratively 
extending a given safe trace F of size N to a safe trace of size N + 1. 

An extension trace is not unique, but there is a largest extension level. We 
denote the set of all extension levels of F by W(F'). The existence of an extension 
level i implies that an unrolling of the 7-suffix does not contain any Bad states: 


Proposition 1. Let F be a safe trace. Then, i, 0 < i < N, is an extension level 
of F iff the formula Tr[F"’] A Bad(tn +1) is unsatisfiable. 


Example 1. For Fig.1, F = [c = 0,c < 66] is a safe trace of size 1. The formula 
(c < 66) A Tr A -7(c’ < 66) is satisfiable. Therefore, there does not exists an 
extension trace at level 1. Since (c = 0) A Tr A (œ < 66) A Tr’ A (c > 66) is 
unsatisfiable, the trace is extendable at level 0. For example, a valid extension 
trace at level 0 is G = [c= 0,c < 2,c < 66]. 


Both Pprand Avy iteratively extend a safe trace either until the extension 
is closed or a counterexample is found. However, they differ in how exactly the 
trace is extended. In the rest of this section, we present Avy and PDR through 
the lens of extension level. The goal of this presentation is to make the paper self- 
contained. We omit many important optimization details, and refer the reader 
to the original papers [7, 13, 29]. 

PDR maintains a monotone, clausal trace F' with Init as the first frame (Fo). 
The trace F' is extended by recursively computing and blocking (if possible) 
states that can reach Bad (called bad states). A bad state is blocked at the largest 
level possible. Algorithm 1 shows PDRBLOCK, the backward search procedure 
that identifies and blocks bad states. PDRBLOCK maintains a queue of states 
and the levels at which they have to be blocked. The smallest level at which 
blocking occurs is tracked in order to show the construction of the extension 
trace. For each state s in the queue, it is checked whether s can be blocked by 
the previous frame Fy_, (line 5). If not, a predecessor state t of s that satisfies 
Fy_1 is computed and added to the queue (line 7). If a predecessor state is found 
at level 0, the trace is not extendable and an empty trace is returned. If the state 
s is blocked at level d, PDRINDGEN, is called to generate a clause that blocks 
s and possibly others. The clause is then added to all the frames at levels less 
than or equal to d. PDRINDGEN is a crucial optimization to PDR. However, we 
do not explain it for the sake of simplicity. The procedure terminates whenever 
there are no more states to be blocked (or a counterexample was found at line 4). 
By construction, the output trace G is an extension trace of F at the extension 
level w. Once PDRextends its trace, PDRPUSH is called to check if the clauses 
it learnt are also true at higher levels. PDR terminates when the trace is closed. 
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Algorithm 1. PDRBLOCK. Algorithm 2. Avy. 
Input: A transition system T = (Init, Tr, Bad) Input: A transition system T = (Init, Tr, Bad) 
Input: A safe trace F with |F| = N Output: SAFE/UNSAFE 
Output: An extension trace G or an empty 1 Fo Init; N -0 
trace 2 repeat 
1w-—N+1;G<—F; Q.push((Bad, N + 1)) 3 if isSat(Tr[F°] A Bad(jn41)) then 
2 while =Q.empty() do return UNSAFE 
3 | (s,d) — Q.pop() a | ke max{i | -1sSat(Tr[F"] A Bad(tn+1))} 
4 if d==0 then return |] 5 Te4a,---,In41 — 
5 if isSaT(Fu-1(0) A Tr(0, 0’) A s(0’)) then SEQITP(Tr[F"] A Bad(Gy+1)) 
6 t — predecessor(s) 6 yo < i< k-Gi F; 
7 Q.push(t,d—1) 7 | Vk<i<(N41-GiCRAL 
8 Q.push(s, d) 8 | F — AvyMkTRAC#([Go,...,GNn+1]) 
9 else 9 F — PpRPUsH(F) 
10 VO <i< d: Ge 10 if 31 <i < NR > (Vi E) then 
(Gi A PDRINDGEN(-s)) 
ja w — min(w,d) return SAFE 
L 11 N-—-N+1 


12 return G 12 until co 


Avy, shown in Algorithm 2, is an alternative to PDR that combines interpo- 
lation and recursive blocking. Avy starts with a trace F, with Fo = Init, that 
is extended in every iteration of the main loop. A counterexample is returned 
whenever F is not extendable (line 3). Otherwise, a sequence interpolant is 
extracted from the unsatisfiability of Tr[F™*)] A Bad(ty +1). A longer trace 
G = [Go,...,Gn, G+] is constructed using the sequence interpolant (line 7). 
Observe that G is an extension trace of F'. While G is safe, it is neither monotone 
nor clausal. A helper routine AVY MKTRACE is used to convert G to a proper 
PDR trace on line 8 (see [29] for the details on AVY MKTRACE). Avy converges 
when the trace is closed. 


4 Interpolating k-Induction 


In this section, we present KAvy, an SMC algorithm that uses the principle 
of strong induction to extend an inductive trace. The section is structured as 
follows. First, we introduce a concept of extending a trace using relative k- 
induction. Second, we present KAVY and describe the details of how k-induction 
is used to compute an extended trace. Third, we describe two techniques for com- 
puting maximal parameters to apply strong induction. Unless stated otherwise, 
we assume that all traces are monotone. 

A safe trace F, with |F| = N, is strongly extendable with respect to (i,k), 
where 1 <k <i+1 < N +1, iff there exists a safe inductive trace G stronger 
than F such that |G| > N and Tr[Fj]* > Gj41. We refer to the pair (i,k) as a 
strong extension level (SEL), and to the trace G as an (i,k)-extension trace, or 
simply a strong extension trace (SET) when (i,k) is not important. Note that 
for k = 1, G is just an extension trace. 


Example 2. For Fig.1, the trace F = [c = 0,c < 66] is strongly extendable at 
level 1. A valid (1, 2)-extension trace is G = [c = 0, (c # 65) A (e < 66), c < 66]. 
Note that (c < 66) is 2-inductive relative to F}, i.e. Tr[Fi]? = (c < 66). 
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We write K(F) for the set of all SELs of F. We define an order on SELs by: 
(i1, k1) S (i2, k2) iff (i) ti < 123 or (ii) t = 12 A kı > kə. The maximal SEL is 
max(K(F)). 


Algorithm 3. KAvy algorithm. 

Input: A transition system T = (Init, Tr, Bad) 
Output: SAFE/UNSAFE 

F <— [Init]; N —0 

repeat 

// Invariant: F is a monotone, clausal, safe, inductive trace 
U — Tr|F° A Bad(tn+1) 

if isSaT(U) then return UNSAFE 

(i,k) — max{(i, k) | -1sSat(Tr|F']* A Bad(tw+1))} 
[Fo,.-.,fw4i] — KAVYEXTEND(F, (i, k)) 

[Fo, see » Fn +i] — PDRPUSH([Fo, pen Fyn+1]) 


if 31 <i < N. Fi > C= Fj) then return SAFE 
NeN+1 
10 until co 


N e 


o o Naan AOO 


Note that the existence of a SEL (i, k) means that an unrolling of the i-suffix 
with F; repeated k times does not contain any bad states. We use Tr] F’]* to 
denote this characteristic formula for SEL (i,k): 


Try = PEA oe |) if0<i<N (6) 
Tr[Fn] Nt ifi=N 


Proposition 2. Let F be a safe trace, where |F| = N. Then, (i,k), 1 < k < 
i+1 < N+1, is an SEL of F iff the formula Tr|F']* \Bad(tn 41) is unsatisfiable. 


The level į in the maximal SEL (i, k) of a given trace F is greater or equal 
to the maximal extension level of F: 


Lemma 1. Let (i,k) = max(K(F)), then i > max(W(F)). 


Hence, extensions based on maximal SEL are constructed from frames at higher 
level compared to extensions based on maximal extension level. 


Example 3. For Fig. 1, the trace [c = 0,c < 66] has a maximum extension level 
of 0. Since (c < 66) is 2-inductive, the trace is strongly extendable at level 1 (as 
was seen in Example 2). 


kAvy Algorithm. KAvY is shown in Fig. 3. It starts with an inductive trace 
F = [Init] and iteratively extends F using SELs. A counterexample is returned 
if the trace cannot be extended (line 4). Otherwise, KAVY computes the largest 
extension level (line 5) (described in Sect.4.2). Then, it constructs a strong 
extension trace using KAVYEXTEND (line 6) (described in Sect. 4.1). Finally, 
PDRPUwSH is called to check whether the trace is closed. Note that F' is a mono- 
tone, clausal, safe inductive trace throughout the algorithm. 
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4.1 Extending a Trace with Strong Induction 


In this section, we describe the procedure KAVYEXTEND (shown in Algorithm 4) 
that given a trace F of size |F| = N and an (i,k) SEL of F constructs an (i, k)- 
extension trace G of size |G| = N +1. The procedure itself is fairly simple, but 
its proof of correctness is complex. We first present the theoretical results that 
connect sequence interpolants with strong extension traces, then the procedure, 
and then details of its correctness. Through the section, we fix a trace F and its 
SEL (i, k). 


Sequence Interpolation for SEL. Let (i,k) be an SEL of F. By Proposition 2, 
W = Tr|F’]* A Bad(ŭn+1) is unsatisfiable. Let A = {Aj_p41,..-, Awai} bea 
partitioning of W defined as follows: 

Filu) A Tr(®;, 0541) ifi-k+1<j<i 

Aj = 9 F(U) A Tr(0;,0;41) ifi<j <N 

Bad(tyn 41) ifj=N+1 
Since (AA) = W, A is unsatisfiable. Let I = [I;_p42,..., [N41] be a sequence 
interpolant corresponding to A. Then, I satisfies the following properties: 

Fin Tr >L k2 VWi-k+2<j<i-(RAQ)ATS Tay (Q) 
In41 > -Bad Yi <j <N- (Fj AL)A Tr > ya 


Note that in (Q), both i and k are fixed—they are the (i, k)-extension level. 
Furthermore, in the top row F; is fixed as well. 

The conjunction of the first k interpolants in I is k-inductive relative to the 
frame F;: 


i+1 
Lemma 2. The formula Fy41 ^ ( \ in) is k-inductive relative to F;. 
mai-k+2 
Proof. Since F; and F;1 are consecutive frames of a trace, F,; Tr > Fj, ,. Thus, 
Vi-k+2 < J <i Tr [Fi] pas = Fi41(0j41). Moreover, by (Q), FA Tr > L k42 
and Vi-k+2<j<it+1-(FiAI;j) A Tr > Ij,,. Equivalently, Vi — k +2 < 
j<i+l1. TEE as => I;41(0;41). By induction over the difference between 
(i+1) and (i—k+2), we show that Tr[FiJithy > (Fist AAi nga Im) (8:41), 
which concludes the proof. 


We use Lemma 2 to define a strong extension trace G: 


Lemma 3. Let G = [Go,...,Gn41], be an inductive trace defined as follows: 
F if0<j<i-k+2 
J 
Fn] A Im) fi-k+2<j<i+2 
Gj = m=i—-k+2 
(F; Aj) ifi+2<j<N+1 


In+1 if j = (N+1) 
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Then, G is an (i, k)-extension trace of F (not necessarily monotone). 


Proof. By Lemma 2, G41 is k-inductive relative to F;. Therefore, it is sufficient 
to show that G is a safe inductive trace that is stronger than F'. By definition, 
vo < j < N- G; => Fj. By (Q), F; A Tr > Ii p42 and Vi-k+2<j < 
i+2: (F; Aj) A Tr > I1 By induction over j, ((F; A Abp=i-r+2 Im) A Tr) > 
fs Tj, for alli-k+2< j< i+2. Since F is monotone, Vi — k + 2 < 
j<i+2- (Œ; A Nieta Im) ^ Tr) > pe ane 


By (Q), Vi < j < N - (Fj A Lj) A Tr = Jj,1. Again, since F is a trace, we 
conclude that Vi < j < N- (Bj AL) A Tr > (Fj41A1j41)’. Combining the above, 
Gj A Tr > Ghat for 0 < j < N. Since F is safe and Iy41 = —Bad, then G is 
safe and stronger than F. 


Lemma 3 defines an obvious procedure to construct an (i, k)-extension trace 
G for F. However, such G is neither monotone nor clausal. In the rest of this 
section, we describe the procedure KAVYEXTEND that starts with a sequence 
interpolant (as in Lemma 3), but uses PDRBLOCK to systematically construct a 
safe monotone clausal extension of F. 

The procedure KAVYEXTEND is shown in Algorithm 4. For simplicity of the 
presentation, we assume that PDRBLOCK does not use inductive generaliza- 
tion. The invariants marked by Ý rely on this assumption. We stress that the 
assumption is for presentation only. The correctness of KAVYEXTEND is inde- 
pendent of it. 

KAVYEXTEND starts with a sequence interpolant according to the partition- 
ing A. The extension trace G is initialized to F and Gy¥+1 is initialized to T 
(line 2). The rest proceeds in three phases: Phase 1 (lines 3-5) computes the 
prefix G;_p42,---,Gji41 using the first k — 1 elements of I; Phase 2 (line 8) 
computes G;41 using I;+1; Phase 3 (lines 9-12) computes the suffix G+? using 
the last (N — i) elements of I. During this phase, PDRPUSH (line 12) pushes 
clauses forward so that they can be used in the next iteration. The correctness 
of the phases follows from the invariants shown in Algorithm 4. We present each 
phase in turn. 

Recall that PDRBLOCK takes a trace F (that is safe up to the last frame) and 
a transition system, and returns a safe strengthening of F, while ensuring that 
the result is monotone and clausal. This guarantee is maintained by Algorithm 4, 
by requiring that any clause added to any frame G; of G is implicitly added to 
all frames below G;. 


Phase 1. By Lemma 2, the first k elements of the sequence interpolant computed 
at line 1 over-approximate states reachable in i + 1 steps of Tr. Phase 1 uses 
this to strengthen G;,1 using the first k elements of I. Note that in that phase, 
new clauses are always added to frame Gj+1, and all frames before it! 
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Algorithm 4. KAVYEXTEND. The invariants marked t hold only when the 
PDRBLOCK does no inductive generalization. 
Input: a monotone, clausal, safe trace F of size N 
Input: A strong extension level (i,k) s.t. TrLF’]* A Bad(ūn+1) is unsatisfiable 
Output: a monotone, clausal, safe trace G of size N + 1 


1 Tj-n42,---,lw41 — SEQITP( TrF’]* A Bad(iw+1)) 
2 G- [Fo,..., Fn, T] 
3 for j — i— k +1 to ido 
4 | Pie (Gj V (Gis A Tj+1)) 
// Invi: G is monotone and clausal 
// Inve: Gi ATr => Pj 
// Inv} : Wi <m < (i +1): Gm = Fm A Ni p1 (Ge V Iepa) 
// Invs : Vj <m < (i+ 1): Gm > Fm ^A MNZ rpi (Ge V Iepa) 
5 L, = Gis+i] <— PDRBLOCK Init, Gi, Gi+1l, (Init, Tr, SPN) 
6 P; — (Gi V (Gigi A Tj+1)) 
7 if i = 0 then E 5 Gisi] g= PDRBLOCK([Init, Gis], (Init, Tr, =P;)) 
8 else E x Gi+i] {É PDRBLOCK(|Init, Gi, Gis], (Init, Tr, -P;)) 


// Inv}: Gini = Figi A Ne—i-k+1 (Ge V Iei) 
// Inva: Gigi > Fi4i A Nizina (Ge V Iei) 
9 for j — i+1 to N +1 do 
10 | Py G3 V (Gj+1 A Ti+) 
// Inve: Gj A Tr => Pj 
11 [> a Gj+1] e= PDRBLOCK([Init, Gj, G5+1], (Init, Tr, =P;)) 
12 G — PDRPUSH(G) 


// Invi: G is an (i,k)-extension trace of F 
// Inv7: G is an extension trace of F 
13 return G 


Correctness of Phase 1 (line 5) follows from the loop invariant Inv2. It holds 
on loop entry since G; A Tr => Ij;_-¢+42 (since G; = F; and (Y)) and G; A Tr > 
Gi41 (since G is initially a trace). Let G; and G* be the it? frame before and 
after execution of iteration j of the loop, respectively. PDRBLOCK blocks =P; 
at iteration j of the loop. Assume that Inv holds at the beginning of the loop. 
Then, G} => G; A Pj since PDRBLOCK strengthens G;. Since Gj => G; and 
Gi > Gi41, this simplifies to Gf > G4 V (G; A 1j+1). Finally, since G is a trace, 
Inv holds at the end of the iteration. 

Invz ensures that the trace given to PDRBLOCK at line 5 can be made safe 
relative to P;. From the post-condition of PDRBLOCK, it follows that at iteration 
J, Gi41 is strengthened to G}; such that G}; = Pj and G remains a monotone 
clausal trace. At the end of Phase 1, [Go,...,Gi+41] is a clausal monotone trace. 

Interestingly, the calls to PDRBLOCK in this phase do not satisfy an expected 
pre-condition: the frame G; in [Init, Gi, Gi41] might not be safe for property Pj. 
However, we can see that Init > P; and from Invg, it is clear that Pj is inductive 
relative to G;. This is a sufficient precondition for PDRBLOCK. 
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Phase 2. This phase strengthens G;+1 using the interpolant [;41. After Phase 2, 
G41 is k-inductive relative to F;. 


Phase 3. Unlike Phase 1, Gj41 is computed at the jt}? iteration. Because of 
this, the property P; in this phase is slightly different than that of Phase 1. 
Correctness follows from invariant Inve that ensures that at iteration j, Gj+1 
can be made safe relative to P;. From the post-condition of PDRBLOCK, it 
follows that Gj41 is strengthened to Gj, such that Gj,, = P; and G is 
a monotone clausal trace. The invariant implies that at the end of the loop 
Gnii1 > Gy V In41, making G safe. Thus, at the end of the loop G is a safe 
monotone clausal trace that is stronger than F'. What remains is to show is that 
G41 is k-inductive relative to Fj. 

Let p be the formula from Lemma 2. Assuming that PDRBLOCK did no 
inductive generalization, Phase 1 maintains Invi, which states that at iteration 
j, PDRBLOCK strengthens frames {Gm}, j < m < (i +1). Inv} holds on loop 
entry, since initially G = F. Let Gm, Gt, (j < m < (i+ 1) ) be frame m 
at the beginning and at the end of the loop iteration, respectively. In the loop, 
PDRBLOCK adds clauses that block —P;. Thus, Gf, = Gm ^ Pj. Since G; > Gm, 
this simplifies to Gt, = Gm A (G4 V Ij+1). Expanding Gm, we get Gt, = Fm A 

t=i—p41 (Ge V Ie41). Thus, Inv} holds at the end of the loop. 

In particular, after line 8, Gi41 = Fii ^ Ng=i-p41 (Ge V Iepa). Since y > 
Gi41, Gi41 is k-inductive relative to F}. 


Theorem 2. Given a safe trace F of size N and an SEL (i,k) for F, KAVYEX- 
TEND returns a clausal monotone extension trace G of size N+1. Furthermore, if 
PDRBLOCK does no inductive generalization then G is an (i, k)-extension trace. 


Of course, assuming that PDRBLOCK does no inductive generalization is 
not realistic. KAVYEXTEND remains correct without the assumption: it returns 
a trace G that is a monotone clausal extension of F. However, Œ might be 
stronger than any (i, k)-extension of F. The invariants marked with t are then 
relaxed to their unmarked versions. Overall, inductive generalization improves 
KAVYEXTEND since it is not restricted to only a k-inductive strengthening. 

Importantly, the output of KAVYEXTEND is a regular inductive trace. Thus, 
KAVYEXTEND is a procedure to strengthen a (relatively) k-inductive certificate 
to a (relatively) 1-inductive certificate. Hence, after KAVYEXTEND, any strategy 
for further generalization or trace extension from IC3, PDR, or Avy is applicable. 


4.2 Searching for the Maximal SEL 


In this section, we describe two algorithms for computing the maximal SEL. 
Both algorithms can be used to implement line 5 of Algorithm 3. They perform 
a guided search for group minimal unsatisfiable subsets. They terminate when 
having fewer clauses would not increase the SEL further. The first, called top- 
down, starts from the largest unrolling of the Tr and then reduces the length of 
the unrolling. The second, called bottom-up, finds the largest (regular) extension 
level first, and then grows it using strong induction. 
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Algorithm 5. A top down alg. for the Algorithm 6. A bottom up alg. for 


maximal SEL. the maximal SEL. 
Input: A transition system Input: A transition system 
T = (Init, Tr, Bad) T = (Init, Tr, Bad) 
Input: An extendable monotone clausal Input: An extendable monotone 
safe trace F of size N clausal safe trace F of size N 
Output: max(K(F)) Output: max(K(F)) 
1i N 1j=N 
2 while 7 > 0 do 2 while j > 0 do 
3 if -1sSat(TrLF’]'*! A Bad(Gn41)) 3 if -1sSat(Tr[F’]' A Bad(tn+1)) 
then break then break 
a | ie G-1) a | j-G-1) 
5 ke-1 5 (i,k) (j,1);j— (j +1); 2 
6 while k < i +1 do 6 while £ < (j +1)^Aj < N do 
7 | if -1sSar(Tr[F']* A Bad(jn+1)) then 7 | if IsSAT(Tr[Fİ]" A Bad(n+1)) 
break then £ (+1) 
k- (k+1) 8 else 
9 return (i,k) 9 | (i, k) m (3,4) 
iO j—(j+1) 


11 return (i, k) 


Top-Down SEL. A pair (i,k) is the maximal SEL iff 


i = max {j|0<j<N- Tr FI A Bad(Gn41) > L} 
k= min {L| 1 << (i4+1)- TrF']‘ A Bad(Gn41) > L} 


Note that k depends on i. For a SEL (i,k) € K(F), we refer to the formula 
Tr|FŻ] as a suffix and to number k as the depth of induction. Thus, the search 
can be split into two phases: (a) find the smallest suffix while using the maximal 
depth of induction allowed (for that suffix), and (b) minimizing the depth of 
induction k for the value of i found in step (a). This is captured in Algorithm 5. 
The algorithm requires at most (N + 1) SAT queries. One downside, however, is 
that the formulas constructed in the first phase (line 3) are large because the 
depth of induction is the maximum possible. 


Bottom-Up SEL. Algorithm6 searches for a SEL by first finding a maximal 
regular extension level (line 2) and then searching for larger SELs (lines 6 to 
10). Observe that if (7,0) ¢ K(F), then Vp > j - (p, © ¢ K(F). This is used at 
line 7 to increase the depth of induction once it is known that (j, 2) ¢ K(F). On 
the other hand, if (j, £) € K(F), there might be a larger SEL (j + 1,2). Thus, 
whenever a SEL (j, £) is found, it is stored in (i, k) and the search continues (line 
10). The algorithm terminates when there are no more valid SEL candidates and 
returns the last valid SEL. Note that £ is incremented only when there does not 
exists a larger SEL with the current value of /. Thus, for each valid level j, if 
there exists SELs with level j, the algorithm is guaranteed to find the largest 
such SEL. Moreover, the level is increased at every possible opportunity. Hence, 
at the end (i, k) = maxK(F). 
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Fig. 2. Runtime comparison on SAFE HWMCC instances (a) and shift instances (b). 


In the worst case, Algorithm6 makes at most 3N SAT queries. However, 
compared to Algorithm 5, the queries are smaller. Moreover, the computation is 
incremental and can be aborted with a sub-optimal solution after execution of 
line 5 or line 9. Note that at line 5, i is a regular extension level (i-e., as in Avy), 
and every execution of line 9 results in a larger SEL. 


5 Evaluation 


We implemented KAVY on top of the Avy Model Checker!. For line 5 of Algo- 
rithm 3 we used Algorithm 5. We evaluated KAvy’s performance against a version 
of Avy [29] from the Hardware Model Checking Competition 2017 [5], and the 
PDR engine of ABC [13]. We have used the benchmarks from HWMCC’14, 715, 
and °17. Benchmarks that are not solved by any of the solvers are excluded from 
the presentation. The experiments were conducted on a cluster running Intel E5- 
2683 V4 CPUs at 2.1 GHz with 8 GB RAM limit and 30 min time limit. 

The results are summarized in Table 1. The HWMCC has a wide variety of 
benchmarks. We aggregate the results based on the competition, and also bench- 
mark origin (based on the name). Some named categories (e.g., intel) include 
benchmarks that have not been included in any competition. The first column in 
Table 1 indicates the category. Total is the number of all available benchmarks, 
ignoring duplicates. That is, if a benchmark appeared in multiple categories, 
it is counted only once. Numbers in brackets indicate the number of instances 
that are solved uniquely by the solver. For example, KAVY solves 14 instances 
in oc8051 that are not solved by any other solver. The VBS column indicates 
the Virtual Best Soluer—the result of running all the three solvers in parallel 
and stopping as soon as one solver terminates successfully. 

Overall, KAVY solves more SAFE instances than both Avy and PDR, while 
taking less time than Avy (we report time for solved instances, ignoring time- 
outs). The VBS column shows that KAvy is a promising new strategy, signifi- 
cantly improving overall performance. In the rest of this section, we analyze the 


1 All code, benchmarks, and results are available at https://arieg.bitbucket.io/avy/. 
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Table 1. Summary of instances solved by each tool. Timeouts were ignored when 
computing the time column. 


BENCHMARKS)|KAvy Avy PDR VBS 
SAFE UNSAFE |Time(m)|SAFE UNSAFE |Time(m)|SAFE UNSAFE |'Time(m)||/SAFE|UNSAFE 

HWMCC’ 17 137 (16) |38 499 128 (3)/38 406 109 (6) |40 (5) |174 150 | 44 
HWMCC’ 15 |193 (4) |84 412 191 (3)|92 (6) | 597 194 (16)|67 (12) (310 218 |104 
HWMCC’ 14 |49 27 (1) |124 58 (4) |26 258 55 (6) |19 (2) |172 64 | 29 
intel 32 (1) 9 196 32 (1) |9 218 19 5 (1) 40 33| 10 
6s 73 (2) |20 157 81 (4) |21 (1) | 329 67 (3) |14 51 86 | 21 
nusmv 13 0 5 14 0 29 16 (2) JO 38 16 0 
bob 30 5 21 30 6 (1) 30 30 (1) |8 (3) 32 31 

pdt 45 1 54 45 (1) /1 57 47 (3) 11 62 49| 1 
oski 26 89 (1) |174 28 (2) |92 (4) | 217 20 53 63 28 | 93 
beem 10 1 49 10 2 32 20 (8) 17 (5) 133 20| 7 
oc8051 34 (14) JO 286 20 0 99 6 (1) 1 (4) 77 35 1 
power 4 0 25 3 0 3 8 (4) 0 31 8 (0) 
shift 5 (2) 0 1 1 0 18 3 0 1 5 0 
necla 5 0 4 7(1) J0 1 5 (1) 0 4 8| 0 
prodcell 0 0 0 0 Hl 28 Q 4 (3) 2 0 4 
bc57 0 0 0 0 0 0 0 4 (4) 9 oO; 4 
Total 326 (19) |141 (1) |957 319 (8)|148 (6)|1041 304 (25)/117 (17)|567 370 |167 


results in more detail, provide detailed run-time comparison between the tools, 
and isolate the effect of the new k-inductive strategy. 

To compare the running time, we present scatter plots comparing KAVY 
and Avy (Fig. 3a), and KAvy and PDR (Fig. 3b). In both figures, KAvy is at 
the bottom. Points above the diagonal are better for KAvy. Compared to Avy, 
whenever an instance is solved by both solvers, KAVY is often faster, sometimes 
by orders of magnitude. Compared to PDR, KAVY and PDR perform well on 
very different instances. This is similar to the observation made by the authors 
of the original paper that presented Avy [29]. Another indicator of performance 
is the depth of convergence. This is summarized in Fig. 3d and e. KAVY often 
converges much sooner than Avy. The comparison with PDRis less clear which 
is consistent with the difference in performance between the two. To get the 
whole picture, Fig. 2a presents a cactus plot that compares the running times of 
the algorithms on all these benchmarks. 

To isolate the effects of k-induction, we compare KAVY to a version of KAVY 
with k-induction disabled, which we call VANILLA. Conceptually, VANILLA is 
similar to Avysince it extends the trace using a 1-inductive extension trace, 
but its implementation is based on KAvy. The results for the running time and 
the depth of convergence are shown in Fig. 3c and f, respectively. The results 
are very clear—using strong extension traces significantly improves performance 
and has non-negligible affect on depth of convergence. 

Finally, we discovered one family of benchmarks, called shift, on which KAVY 
performs orders of magnitude better than all other techniques. The benchmarks 
come from encoding bit-vector decision problem into circuits [21,30]. The shift 
family corresponds to deciding satisfiability of (x + y) = (a << 1) for two 
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Fig. 3. Comparing running time ((a), (b), (c)) and depth of convergence ((d), (e), (f)) 
of Avy, PDRand VANILLA with KAvY. KAvy is shown on the x-axis. Points above the 
diagonal are better for KAvY. Only those instances that have been solved by both 
solvers are shown in each plot. 


bit-vecors x and y. The family is parameterized by bit-width. The property is 
k-inductive, where k is the bit-width of x. The results of running Avy, PDR, 
k-induction?, and KAvy are shown in Fig. 2b. Except for KAvY, all techniques 
exhibit exponential behavior in the bit-width, while KAvy remains constant. 
Deeper analysis indicates that KAvy finds a small inductive invariant while 
exploring just two steps in the execution of the circuit. At the same time, neither 
inductive generalization nor k-induction alone are able to consistently find the 
same invariant quickly. 


6 Conclusion 


In this paper, we present KAvy—an SMC algorithm that effectively uses k- 
inductive reasoning to guide interpolation and inductive generalization. KAVY 
searches both for a good inductive strengthening and for the most effective induc- 
tion depth k. We have implemented KAvy on top of Avy Model Checker. The 
experimental results on HWMCC instances show that our approach is effective. 

The search for the maximal SEL is an overhead in KAvy. There could be 
benchmarks in which this overhead outweighs its benefits. However, we have not 
come across such benchmarks so far. In such cases, KAVY can choose to settle 
for a sub-optimal SEL as mentioned in Sect. 4.2. Deciding when and how much 
to settle for remains a challenge. 


? We used the k-induction engine ind in ABC [8]. 


Interpolating Strong Induction 383 


Acknowledgements. We thank the anonymous reviewers and Oded Padon for their 
thorough review and insightful comments. This research was enabled in part by sup- 
port provided by Compute Ontario (https://computeontario.ca/), Compute Canada 
(https://www.computecanada.ca/) and the grants from Natural Sciences and Engi- 
neering Research Council Canada. 


References 


10. 


11. 


12. 


13. 


Audemard, G., Lagniez, J.-M., Szczepanski, N., Tabary, S.: An adaptive parallel 
SAT solver. In: Rueher, M. (ed.) CP 2016. LNCS, vol. 9892, pp. 30-48. Springer, 
Cham (2016). https://doi-org/10.1007/978-3-319-44953-1_3 

Belov, A., Marques-Silva, J.: MUSer2: an efficient MUS extractor. JSAT 8(3/4), 
123-128 (2012) 

Berryhill, R., Ivrii, A., Veira, N., Veneris, A.G.: Learning support sets in IC3 and 
Quip: the good, the bad, and the ugly. In: 2017 Formal Methods in Computer Aided 
Design, FMCAD 2017, Vienna, Austria, 2-6 October 2017, pp. 140-147 (2017) 
Biere, A., Cimatti, A., Clarke, E., Zhu, Y.: Symbolic model checking without 
BDDs. In: Cleaveland, W.R. (ed.) TACAS 1999. LNCS, vol. 1579, pp. 193-207. 
Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49059-0_14 

Biere, A., van Dijk, T., Heljanko, K.: Hardware model checking competition 2017. 
In: Stewart, D., Weissenbacher, G. (eds.) 2017 Formal Methods in Computer Aided 
Design, FMCAD 2017, Vienna, Austria, 2-6 October 2017, p. 9. IEEE (2017) 
Bjørner, N., Gurfinkel, A., McMillan, K., Rybalchenko, A.: Horn clause solvers for 
program verification. In: Beklemishev, L.D., Blass, A., Dershowitz, N., Finkbeiner, 
B., Schulte, W. (eds.) Fields of Logic and Computation II. LNCS, vol. 9300, pp. 
24-51. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23534-9_2 
Bradley, A.R.: SAT-based model checking without unrolling. In: Jhala, R., 
Schmidt, D. (eds.) VMCAI 2011. LNCS, vol. 6538, pp. 70-87. Springer, Heidel- 
berg (2011). https: //doi.org/10.1007/978-3-642-18275-4_7 

Brayton, R., Mishchenko, A.: ABC: an academic industrial-strength verification 
tool. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 
24-40. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14295-6_5 
Champion, A., Mebsout, A., Sticksel, C., Tinelli, C.: The KIND 2 model checker. 
In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9780, pp. 510-517. 
Springer, Cham (2016). https://doi-org/10.1007/978-3-319-41540-6_29 

Craig, W.: Three uses of the Herbrand-Gentzen theorem in relating model theory 
and proof theory. J. Symb. Log. 22(3), 269-285 (1957) 

de Moura, L., et al.: SAL 2. In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, 
vol. 3114, pp. 496-500. Springer, Heidelberg (2004). https: //doi.org/10.1007/978- 
3-540-27813-9_45 

Eén, N., Mishchenko, A., Amla, N.: A single-instance incremental SAT formulation 
of proof- and counterexample-based abstraction. In: Proceedings of 10th Interna- 
tional Conference on Formal Methods in Computer-Aided Design, FMCAD 2010, 
Lugano, Switzerland, 20-23 October, pp. 181-188 (2010) 

Eén, N., Mishchenko, A., Brayton, R.K.: Efficient implementation of prop- 
erty directed reachability. In: International Conference on Formal Methods in 
Computer-Aided Design, FMCAD 2011, Austin, TX, USA, October 30-02 Novem- 
ber 2011, pp. 125-134 (2011) 


384 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


2T. 


28. 


H. G. Vediramana Krishnan et al. 


Garoche, P.-L., Kahsai, T., Tinelli, C.: Incremental invariant generation using logic- 
based automatic abstract transformers. In: Brat, G., Rungta, N., Venet, A. (eds.) 
NFM 2013. LNCS, vol. 7871, pp. 139-154. Springer, Heidelberg (2013). https:// 
doi.org/10.1007/978-3-642-38088-4_10 

Gurfinkel, A., Ivrii, A.: Pushing to the top. In: Formal Methods in Computer- 
Aided Design, FMCAD 2015, Austin, Texas, USA, 27-30 September 2015, pp. 
65-72 (2015) 

Gurfinkel, A., Ivrii, A.: K-induction without unrolling. In: 2017 Formal Methods 
in Computer Aided Design, FMCAD 2017, Vienna, Austria, 2—6 October 2017, pp. 
148-155 (2017) 

Heule, M., Hunt Jr., W.A., Wetzler, N.: Trimming while checking clausal proofs. In: 
Formal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA, 
20-23 October 2013, pp. 181-188 (2013) 

Järvisalo, M., Heule, M.J.H., Biere, A.: Inprocessing rules. In: Gramlich, B., Miller, 
D., Sattler, U. (eds.) IJCAR 2012. LNCS (LNAI), vol. 7364, pp. 355-370. Springer, 
Heidelberg (2012). https://doi.org/10.1007/978-3-642-31365-3-28 

Jovanovic, D., Dutertre, B.: Property-directed k-induction. In: 2016 Formal Meth- 
ods in Computer-Aided Design, FMCAD 2016, Mountain View, CA, USA, 3-6 
October 2016, pp. 85-92 (2016) 

Kahsai, T., Ge, Y., Tinelli, C.: Instantiation-based invariant discovery. In: Bobaru, 
M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) NFM 2011. LNCS, vol. 6617, pp. 
192-206. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20398- 
5-15 

Kovasznai, G., Frohlich, A., Biere, A.: Complexity of fixed-size bit-vector logics. 
Theory Comput. Syst. 59(2), 323-376 (2016) 

Liang, J.H., Ganesh, V., Poupart, P., Czarnecki, K.: Learning rate based branching 
heuristic for SAT solvers. In: Creignou, N., Le Berre, D. (eds.) SAT 2016. LNCS, 
vol. 9710, pp. 123-140. Springer, Cham (2016). https://doi.org/10.1007/978-3-319- 
40970-2_9 

Liang, J.H., Oh, C., Mathew, M., Thomas, C., Li, C., Ganesh, V.: Machine 
learning-based restart policy for CDCL SAT solvers. In: Beyersdorff, O., Win- 
tersteiger, C.M. (eds.) SAT 2018. LNCS, vol. 10929, pp. 94-110. Springer, Cham 
(2018). https: //doi.org/10.1007/978-3-319-94144-8_6 

McMillan, K.L.: Interpolation and SAT-based model checking. In: Hunt, W.A., 
Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 1-13. Springer, Heidelberg 
(2003). https://doi.org/10.1007/978-3-540-45069-6_1 

McMillan, K.L.: Interpolation and model checking. In: Clarke, E., Henzinger, T., 
Veith, H., Bloem, R. (eds.) Handbook of Model Checking, pp. 421-446. Springer, 
Cham (2018) 

Mebsout, A., Tinelli, C.: Proof certificates for SMT-based model checkers for 
infinite-state systems. In: 2016 Formal Methods in Computer-Aided Design, 
FMCAD 2016, Mountain View, CA, USA, 3-6 October 2016, pp. 117-124 (2016) 
Sheeran, M., Singh, S., Stalmarck, G.: Checking safety properties using induction 
and a SAT-solver. In: Hunt, W.A., Johnson, S.D. (eds.) FMCAD 2000. LNCS, 
vol. 1954, pp. 127-144. Springer, Heidelberg (2000). https://doi.org/10.1007/3- 
540-40922-X_8 

Vizel, Y., Grumberg, O.: Interpolation-sequence based model checking. In: Pro- 
ceedings of 9th International Conference on Formal Methods in Computer-Aided 
Design, FMCAD 2009, 15-18 November 2009, Austin, Texas, USA, pp. 1-8 (2009) 


Interpolating Strong Induction 385 


29. Vizel, Y., Gurfinkel, A.: Interpolating property directed reachability. In: Biere, A., 
Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 260-276. Springer, Cham (2014). 
https: //doi.org/10.1007 /978-3-319-08867-9_17 

30. Vizel, Y., Nadel, A., Malik, S.: Solving linear arithmetic with SAT-based model 
checking. In: 2017 Formal Methods in Computer Aided Design, FMCAD 2017, 
Vienna, Austria, 2-6 October 2017, pp. 47-54 (2017) 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the 
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter’s Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 


ui 
a | 


Check for 
updates 


Verifying Asynchronous Event-Driven 
Programs Using Partial Abstract 
Transformers 


Peizun Liu!®©), Thomas Wahl!', 
and Akash Lal? 


1 Northeastern University, Boston, USA 
lpzun@ccs.neu.edu 
2 Microsoft Research, Bangalore, India 


Abstract. We address the problem of analyzing asynchronous event- 
driven programs, in which concurrent agents communicate via 
unbounded message queues. The safety verification problem for such pro- 
grams is undecidable. We present in this paper a technique that combines 
queue-bounded exploration with a convergence test: if the sequence of cer- 
tain abstractions of the reachable states, for increasing queue bounds k, 
converges, we can prove any property of the program that is preserved by 
the abstraction. If the abstract state space is finite, convergence is guar- 
anteed; the challenge is to catch the point kmax where it happens. We 
further demonstrate how simple invariants formulated over the concrete 
domain can be used to eliminate spurious abstract states, which other- 
wise prevent the sequence from converging. We have implemented our 
technique for the P programming language for event-driven programs. 
We show experimentally that the sequence of abstractions often con- 
verges fully automatically, in hard cases with minimal designer support 
in the form of sequentially provable invariants, and that this happens for 
a value of kmax small enough to allow the method to succeed in practice. 


1 Introduction 


Asynchronous event-driven (AED) programming refers to a style of programming 
multi-agent applications. The agents communicate shared work via messages. 
Each agent waits for a message to arrive, and then processes it, possibly sending 
messages to other agents, in order to collectively achieve a goal. This program- 
ming style is common for distributed systems as well as low-level designs such as 
device drivers [11]. Getting such applications right is an arduous task, due to the 
inherent concurrency: the programmer must defend against all possible interleav- 
ings of messages between agents. In response to this challenge, recent years have 
seen multiple approaches to verifying AED-like programs, e.g. by delaying send 
actions, or temporarily bounding their number (to keep queue sizes small) [7,10], 
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or by reasoning about a small number of representative execution schedules, to 
avoid interleaving explosion [5]. 

In this paper we consider the P language for AED programming [11]. A P pro- 
gram consists of multiple state machines running in parallel. Each machine has 
a local store, and a message queue through which it receives events from other 
machines. P allows the programmer to formulate safety specifications via a state- 
ment that asserts some predicate over the local state of a single machine. Verify- 
ing such reachability properties of course requires reasoning over global system 
behavior and is, for unbounded-queue P programs, undecidable [8]. 

The unboundedness of the reachable state space does not prevent the use of 
testing tools that try to explore as much of the state space as possible [3,6, 11, 13] 
in the quest for bugs. Somewhat inspired by this kind of approach, the goal of this 
paper is a verification technique that can (sometimes) prove a safety property, 
despite exploring only a finite fraction of that space. Our approach is as follows. 
Assuming that the machines’ queues are the only source of unboundedness, we 
consider a bound k on the queue size, and exhaustively compute the reachable 
states Ry, of the resulting finite-state problem, checking the local assertion ® 
along the way. We then increase the queue bound until (an error is found, or) we 
reach some point kmax of convergence: a point that allows us to conclude that 
increasing k further is not required to prove ®. 

What kind of “convergence” are we targeting? We design a sequence (Rp)? o 
of abstractions of each reachability set over a finite abstract state space. Due to 
the monotonicity of sequence (Rp)? o, this ensures convergence, i.e. the existence 
of kmax Such that Ry = Rpa for all K > kmax. Provided that an abstract state 
satisfies ® exactly if all its concretizations do, we have: if all abstract states in 
Rkaa comply with ®, then so do all reachable concrete states of P—we have 
proved the property. 

We implement this strategy using an abstraction function a with a finite 
co-domain that leaves the local state of a machine unchanged and maintains 
the first occurrence of each event in the queue; repeat occurrences are dropped. 
This abstraction preserves properties over the local state and the head of the 
queue, i.e. the visible (to the machine) part of the state space, which is typically 
sufficient to express reachability properties. 

The second major step in our approach is the detection of the point of con- 
vergence of (Rp)? o: We show that, for the best abstract transformer Im [9,27, 
see Sect. 4.2], if Im(Rz) C Re, then Rpg = Rx for all K > k. In fact, we have a 
stronger result: under an easy-to-enforce condition, it suffices to consider abstract 
dequeue operations: all others, namely enqueue and local actions, never lead to 
abstract states in Rest \ Rp. The best abstract transformer for dequeue actions 
is efficiently implementable for a given P program. 

It is of course possible that the convergence condition Im(R,) C Rp never 
holds (the problem is undecidable). This manifests in the presence of a spurious 
abstract state in the image produced by Im, i.e. one whose concretization does 
not contain any reachable state. Our third contribution is a technique to assist 
users in eliminating such states, enhancing the chances for convergence. We 
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have observed that spurious abstract states are often due to violations of simple 
machine invariants: invariants that do not depend on the behavior of other 
machines. By their nature, they can be proved using a cheap sequential analysis. 

We can eliminate an abstract state (e.g. produced by Im) if all its concretiza- 
tions violate a machine invariant. In this paper, we propose a domain-specific 
temporal logic to express invariants over machines with event queues and, more 
importantly, an algorithm that decides the above abstract queue invariant check- 
ing problem, by reducing it efficiently to a plain model checking problem. We 
have used this technique to ensure the convergence in “hard” cases that otherwise 
defy convergence of the abstract reachable states sequence. 

We have implemented our technique for the P language and empirically eval- 
uated it on an extensive set of benchmark programs. The experimental results 
support the following conclusions: (i) for our benchmark programs, the sequence 
of abstractions often converges fully automatically, in hard cases with minimal 
designer support in the form of separately dischargeable invariants; (ii) almost all 
examples converge at a small value of kmax; and (iii) the overhead our technique 
adds to the bounding technique is small: the bulk is spent on the exhaustive 
bounded exploration itself. 

Proofs and other supporting material can be found in the Appendix of [23]. 


2 Overview 


We illustrate the main ideas of this paper using an example in the P language. 
A machine in a P program consists of multiple states. Each state defines an entry 
code block that is executed when the machine enters the state. The state also 
defines handlers for each event type e that it is prepared to receive. A handler 
can either be on e do foo (executing foo on receiving e), or ignore e (dequeuing 
and dropping e). A state can also have a defer e declaration; the semantics is that 
a machine dequeues the first non-deferred event in its queue. As a result, a queue 
in a P program is not strictly FIFO. This relaxation is an important feature of 
P that helps programmers express their logic compactly [11]. Figure 1 shows a P 
program named PiFl, in which a Sender (eventually) floods a Receiver’s queue 
with PING events. This queue is the only source of unboundedness in PiFl. 

A critical property for P programs is (bounded) responsiveness: the receiving 
machine must have a handler (e.g. on, defer, ignore) for every event arriving at the 
queue head; otherwise the event will come as a “surprise” and crash the machine. 
To prove responsiveness for PiFl, we have to demonstrate (among others) that 
in state Ignore_it, the DONE event is never at the head of the Receiver’s queue. 
We cannot perform exhaustive model checking, since the set of reachable states 
is infinite. Instead, we will compute a conservative abstraction of this set that is 
precise enough to rule out DONE events at the queue head in this state. 

We first define a suitable abstraction function a that collapses repeated occur- 
rences of events to each event’s first occurrence. For instance, the queue 


Q = PRIME.PRIME.PRIME.DONE.PING.PING.PING.PING (1) 
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event PRIME, DONE, PING; 


1 

2 tate Pi it 

3 machine Sender { 20 state Ping-it { 

ple EN 21 entry { 

4 var receiver: machine; d ; Pme to Pi ie: 

5 start state Init { 22 send receiver, PING; goto Ping-it ; 
23 } 

6 entry { 

pert : : 24 } 

7 receiver = new Receiver (); 
25} 

8 t 

9 goto Prime-it ; 26 7 5 

10 27 machine Receiver { 

ii state Prime_it { 28 start state Init { 

12 eny { 29 defer PRIME; 

13 yae i ake 30 on DONE goto Ignore-it; 

14 while (i < 3) { // 3x PRIME 3l } 

15 send receiver, PRIME; i = i + 1; 32 

16 } 33 state Ignore-it { 

17 send receiver, DONE; goto Ping it; ignore PRIME, PING; 

18 } 
36 } 


19 } 


Fig. 1. PiFl: a Ping-Flood scenario. The Sender and the Receiver communicate via 
events of types PRIME, DONE, and PING. After sending some PRIME events and one 
DONE, the Sender floods the Receiver with PINGs. The Receiver initially defers PRIMES. 
Upon receiving DONE it enters a state in which it ignores PING. 


will be abstracted to Q = a(Q) = PRIME.DONE.PING. The finite number of 
possible abstract queues is 1+3+3-2+3-2-1= 16. The abstraction preserves 
the head of the queue. This and the machine state has enough information to 
check responsiveness. 

We now generate the sequence Ry of abstractions of the reachable states 
sets Rẹ for queue size bounds k = 0,1,2,..., by computing each finite set Rx, 
and then R, as a(R,). The obtained monotone sequence (Rp)? over a finite 
domain will eventually converge, but we must prove that it has. This is done 
by applying the best abstract transformer Im, restricted to dequeue operations 
(defined in Sect. 4.2), to the current set Ry, and confirming that the result is 
contained in Rp. 

As it turns out, the confirmation fails for the PiFl program: k = 5 marks 
the first time set Ry repeats, i.e. R4 = Rs, so we are motivated to run the 
convergence test. Unfortunately we find a state 5 € Im(Rs)\ Rs, preventing 
convergence. Our approach now offers two remedies to this dilemma. One is to 
refine the queue abstraction. In our implementation, function a is really ap, 
for a parameter p that denotes the size of the prefix of the queue that is kept 
unchanged by the abstraction. For example, for the queue from Eq. (1) we have 
a4(Q) = PRIME.PRIME.PRIME.DONE | PING, where | separates the prefix from 
the “infinite tail” of the abstract queue. This (straightforward) refinement main- 
tains finiteness of the abstraction and increases precision, by revealing that the 
queue starts with three PRIME events. Re-running the analysis for the PiFl 
program with p = 4, at k = 5 we find Im(Rs) C Rs, and the proof is complete. 

The second remedy to the failed convergence test dilemma is more powerful 
but also less automatic. Let’s revert to prefix p = 0 and inspect the abstract 
state 5 € Im(Rs) \ Rs that foils the test. We find that it features a DONE event 
followed by a PRIME event in the Receiver’s queue. A simple static analysis of the 
Sender’s machine in isolation shows that it permits no path from the send DONE 
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to the send PRIME statement. The behavior of other machines is irrelevant for 
this invariant; we call it a machine invariant. We pass the invariant to our tool 
via the command line using the expression 


G (DONE > G—=PRIME) (2) 


in a temporal-logic like notation called QuTL (Sect.5.1), where G universally 
quantifies over all queue entries. Our tool includes a QuTL checker that deter- 
mines that every concretization of 5 violates property (2), concluding that 5 
is spurious and can be discarded. This turns out to be sufficient for convergence. 


3  Queue-(Un)Bounded Reachability Analysis 


Communicating Queue Systems. We consider P programs consisting of a 
fixed and known number n of machines communicating via event passing through 
unbounded FIFO queues.! For simplicity, we assume the machines are created 
at the start of the program; dynamic creation at a later time can be simulated 
by having the machine ignore all events until it receives a special creation event. 

We model such a program as a communicating queue system (CQS). For- 
mally, given n € N, a CQS P” is a collection of n queue automata (QA) 
P; = (X, Li, Acti, A;, 01), 1 <i < n. A QA consists of a finite queue alpha- 
bet X shared by all QA, a finite set £; of local states, a finite set Act; of action 
labels, a finite set A; C £; x (XU {e}) x Act; x Li x (XU {e}) of transitions, 
and an initial local state 4l € Li. An action label act € Act; is of the form 


— act € {deq, loc}, denoting an action internal to P; (no other QA involved) 
that either dequeues an event (deq), or updates its local state (loc); or 

— act = !(e, j), for e € X, j € {1,...,n}, denoting a transmission, where P; 
(the sender) adds event e to the end of the queue of P; (the receiver). 


The individual QA of a CQS model machines of a P program; hence we refer 
to QA states as machine states. A transmit action is the only communication 
mechanism among the QA. 


Semantics. A machine state m of a QA is of the form (£, Q) € £L x X*; state 
m? = (Ve) is initial. We define machine transitions corresponding to internal 
actions as follows (transmit actions are defined later at the global level): 


(Me) S We) EA 
(2,2) > (#0) 
(t,e) W e) EA 
(£,eQ) — (C,Q) 


for £,f/€ L, Qe X* (local) 


for 0,t’EL,e€ X, Q€ X* (dequeue) 


1 The P language permits unbounded machine creation, a feature that we do not allow 
here and that is not used in any of the benchmarks we are aware of. 
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A (global) state s of a CQS is a tuple ((€1, Q1),..-, (€n, Qn)) where (4, Qi) € 
Li x X* for i € {1,...,n}. State s? = ((€,c),...,(€4,€)) is initial. We extend 
the machine transition relation — to states as follows: 


((41, Qı), seig (ln, Qn)) =a (( 1 Q5), aehan (Lrs Qa)) 
if there exists i € {1,...,n} such that one of the following holds: 
(internal) (4;, Q:) > (&, Q'), and for all k € {1,...,n}\ {i}, le = 4, Qk = Qh; 


(transmission) there exists j € {1,...,n} and e € X such that: 
1. (4,6) 2 (#6) € Ag 
2. Q} = Qe; 


3. w = bk for all k € {1,...,n}\ {i}; and 
4. QO}, = Qp for all & € {1,. neg \ ah 


The execution model of a CQS is strictly interleaving. That is, in each step, one 
of the two above transitions (internal) or (transmission) is performed for a 
nondeterministically chosen machine 7. 


Queue-Bounded and Queue-Unbounded Reachability. Given a CQS P”, 
a state s = ((1, Q1),...,(€n, Qn)), and a number k, the queue-bounded reacha- 
bility problem (for s and k) determines whether s is reachable under queue bound 
k, i.e. whether there exists a path sọ —> s,... — sz such that so = s7, sz = s, and 
for i € {0,..., 2}, all queues in state s; have at most k events. Queue-bounded 
reachability for k is trivially decidable, by making enqueue actions for queues of 
size k blocking (the sender cannot continue), which results in a finite state space. 
We write Rẹ = {s : s is reachable under queue bound k}. 

Queue-bounded reachability will be used in this paper as a tool for solving our 
actual problem of interest: Given a CQS P” and a state s, the Queue- UnBounded 
reachability Analysis (QUBA) problem determines whether s is reachable, i.e. 
whether there exists a (queue-unbounded) path from s? to s. The QUBA problem 
is undecidable [8]. We write R (= U,en Re) for the set of reachable states. 


4 Convergence via Partial Abstract Transformers 


In this section, we formalize our approach to detecting the convergence of a 
suitable sequence of observations about the states Ry reachable under k-bounded 
semantics. We define the observations as abstractions of those states, resulting 
in sets Ry. We then investigate the convergence of the sequence (Rp)? o.- 


4.1 List Abstractions of Queues 


Our abstraction function applies to queues, as defined below. Its action on 
machine and system states then follows from the hierarchical design of a CQS. 
Let |Q| denote the number of events in Q, and Q/i] the ith event in Q 
(0 <7 <|Q)). 
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Definition 1. For a parameter p € N, the list abstraction function ay : X* => 
X* is defined as follows: 


1. ap(e) =e. 
2. For a non-empty queue Q= P-e, 


_ f ap(P) if there exists j s.t. p< j < |P| and Q[j] =e 
(2) = eas -e otherwise e 3) 


Intuitively, a, abstracts a queue by leaving its first p events unchanged (an 
idea also used in [16]). Starting from position p it keeps only the first occur- 
rence of each event e in the queue, if any; repeat occurrences are dropped.” The 
preservation of existence and order of the first occurrences of all present events 
motivates the term list abstraction. An alternative is an abstraction that keeps 
only the set (not: list) of queue elements from position p, i.e. it ignores multi- 
plicity and order. This is by definition less precise than the list abstraction and 
provided no efficiency advantages in our experiments. An abstraction that keeps 
only the queue head proved cheap but too imprecise. 

The motivation for parameter p is that many protocols proceed in rounds 
of repeating communication patterns, involving a bounded number of message 
exchanges. If p exceeds that number, the list abstraction’s loss of information 
may be immaterial. 

We write an abstract queue Q = a,(Q) in the form pref | suff s.t. p = |pref |, 
and refer to pref as Q’s prefix (shared with Q), and suff as Q’s suffix. 


Example 2. The queues Q € {bbbba, bbba, bbbaa} are a2-equivalent: a2(Q) = 
bb | ba. 


We extend a, to act on a machine state via @œp(li, Qi) = (4:,ap(Qi)), on a 
state via a,(s) = ((l1, @p(Q1)),---, Cn, @p(Qn))), and on a set of states point- 
wise via a,(S) = {a,(s): s € S}. 


Discussion. The abstract state space is finite since the queue prefix is of fixed 
size, and each event in the suffix is recorded at most once (the event alphabet is 
finite). The sets of reachable abstract states grow monotonously with increasing 
queue size bound k, since the sets of reachable concrete states do: 


ky< kp => Ry, C Rk > Op(Re,) C op(Re,) - 


Finiteness and monotonicity guarantee convergence of the sequence of reachable 
abstract states. 

We say the abstraction function a, respects a property of a state if, for any two 
Qp-equivalent states (see Example 2), the property holds for both or for neither. 
Function a, respects properties that refer to the local-state part of a machine, and 
to the first p + 1 events of its queue (which are preserved by a,). In addition, the 
property may look beyond the prefix and refer to the existence of events in the 
queue, but not their frequency or their order after the first occurrence. 


2 Note that the head of the queue is always preserved by Qp, even for p = 0. 
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The rich information preserved by the abstraction (despite being finite-state) 
especially pays off in connection with the defer feature in the P language, which 
allows machines to delay handling certain events at the head of a queue [11]. The 
machine identifies the first non-deferred event in the queue, a piece of information 
that is precisely preserved by the list abstraction (no matter what p). 


Definition 3. Given an abstract queue Q = eo... ep—1 | Cp... €z-1, the con- 
cretization function yp: ©* — 2” maps Q to the language of the regular 
expression 


RE,(Q) i= e9.. . €p—1€p {Cp} epi {ep, ep1} -..€z—1{€p;.-. ,€z-1}°, (4) 


i.e. Yp(Q) := L(RE,(Q)). 

As a special case, RE,(e) = € and so 7p(e) = L(e) = {e} for the empty 
queue. We extend yp to act on abstract (machine or global) states in a way 
analogous to the extension of œp, by moving it inside to the queues occurring in 
those states. 


4.2 Abstract Convergence Detection 


Recall that finiteness and monotonicity of the sequence (Rj)? o guarantee its 
convergence, so nothing seems more suggestive than to compute the limit. We 
summarize our overall procedure to do so in Algorithm 1. The procedure iter- 
atively increases the queue bound k and computes the concrete and (per a,- 
projection) the abstract reachability sets Rg and Rp. If, for some k, an error is 
detected, the procedure terminates (Lines 4-5; in practice implemented as an 
on-the-fly check). 


Algorithm 1. Queue-unbounded reachability analysis 
Input: CQS with transition relation — , p € N, property ® respected by ap. 
1: compute Ro; Ro := ap(Ro) 
2: for k := 1 to œ do 
3: compute Rg; Rk := ap(Rx) 
if Ir € Re: r AP then 
return “error reachable with queue bound k” 
if [Rz] = |Re—-1| then 
T := (apo Imdeq © Yp)(Rr) > partial best abstract transformer 
if T C Rp then 
return “safe for any queue bound” 


The key of the algorithm is reflected in Lines 6-9 and is based on the fol- 
lowing idea (all claims are proved as part of Theorem 4 below). If the computa- 
tion of Ry, reveals no new abstract states in round k (Line 6; by monotonicity, 
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“same size” implies “same sets”), we apply the best abstract transformer [9,27] 
Im := ap o Im, o Yp to Ry: if the result is contained in Rg, the abstract reach- 
ability sequence has converged. However, we can do better: we can restrict the 
successor function Im— of the CQS to dequeue actions, denoted Im qeg in Line 7. 
The ultimate reason is that firing a local or transmit action on two a@p-equivalent 
states r and s results again in a,-equivalent states r’ and s’. This fact does not 
hold for dequeue actions: the successors r’ and s’ of dequeues depend on the 
abstracted parts of r and s, resp., which may differ and become “visible” during 
the dequeue (e.g. the event behind the queue head moves into the head position). 
Our main result therefore is: if Ry = Rp—ı and dequeue actions do not create 
new abstract states (Lines 7 and 8), sequence (Rj,)?29 has converged: 


Theorem 4. If Rk = R,_1 and T C Rx, then for any K > k, Rr = Rp. 


If the sequence of reachable abstract states has converged, then all reachable 
concrete states (any k) belong to 7)(Rx) (for the current k). Since the abstraction 
function a» respects property P, we know that if any reachable concrete state 
violated &, so would any other concrete state that maps to the same abstraction. 
However, for each abstract state in Rp, Line 4 has examined at least one state r 
in its concretization; a violation was not found. We conclude: 


Corollary 5. Line 9 of Algorithm 1 correctly asserts that no reachable concrete 
state of the given CQS violates ®. 


The corollary (along with the earlier statement about Lines 4-5) confirms 
the partial correctness of Algorithm1. The procedure is, however, necessarily 
incomplete: if no error is detected and the convergence condition in Line 8 never 
holds, the for loop will run forever. 

We conclude this part with two comments. First, note that we do not compute 
the sets R, as reachability fixpoints in the abstract domain (i.e. the domain 
of ap). Instead, we compute the concrete reachability sets first, and then obtain 
the R;,, via projection (Line 1). The reason is that the projection gives us the 
exact set of abstractions of reachable concrete states, while an abstract fixpoint 
likely overapproximates (for instance, the best abstract transformer from Line 7 
does) and loses precision. Note that a primary motivation for computing abstract 
fixpoints, namely that the concrete fixpoint may not be computable, does not 
apply here: the concrete domains are finite, for each k. 

Second, we observe that this projection technique comes with a cost: sequence 
(Rk)? o may stutter at intermediate moments: Rg © Reyi = Rezo G Rp43- 
The reason is that Rķk+3 is not obtained as a functional image of Rk+2, but by 
projection from R,z+3. As a consequence, we cannot short-cut the convergence 
detection by just “waiting” for (Rx)? to stabilize, despite the finite domain. 


4.3 Computing Partial Best Abstract Transformers 


Recall that in Line 7 we compute 


T = Imdeq( Rr) = (ap © IMaeq © Yp) (Rx) - (5) 
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The line applies the best abstract transformer, restricted to dequeue actions, 
to Rx. This result cannot be computed as defined in (5), since 7(Rx) is typically 
infinite. However, Rx is finite, so we can iterative over T € Rx, and little informa- 
tion is actually needed to determine the abstract successors of 7. The “infinite 
fragment” of r remains unchanged, which makes the action implementable. 

Formally, let 7 = (£, Q) with Q = ege,... ep—1 | Cp€pt1---€z—-1- To apply a 
dequeue action to 7, we first perform local-state updates on @ as required by the 
action, resulting in ¢’. Now consider Q. The first suffix event, €p, moves into the 
prefix due to the dequeue. We do not know whether there are later occurrences 
of ep before or after the first suffix occurrences of ep+1 - . . €z—1. This information 
determines the possible abstract queues resulting from the dequeue. To compute 
the exact best abstract transformer, we enumerate these possibilities: 


Im aeq({(£, Q)}) = €1. . - €p |Cp41€p+2 E oF 
€1 - . . €p || €p l€p+1€p+2 - - - €z—1 


{ w, g) ; g = €1..-€p|Ep41 Ep \Ept+2---€z—-1 } 


€1..-€p | En+1€p42---€z—-1) €p 


The first case for g applies if there are no occurrences of ep in the suffix 
after the dequeue. The remaining cases enumerate possible positions of the first 
occurrence of ep (boxed, for readability) in the suffix after the dequeue. The cost 
of this enumeration is linear in the length of the suffix of the abstract queue. 

Since our list abstraction maintains the first occurrence of each event, the 
semantics of defer (see the Discussion in Sect. 4.1) can be implemented abstractly 
without loss of information (not shown above, for simplicity). 


5 Abstract Queue Invariant Checking 


The abstract transformer function in Sect. 4 is used to decide whether sequence 
(Rk)? o has converged. Being an overapproximation, the function may gener- 
ate spurious states: they are not reachable, i.e. no concretization of them is. 
Unfortunate for us, spurious abstract states always prevent convergence. 

A key empirical observation is that concretizations of spurious abstract states 
often violate simple machine invariants, which can be proved from the perspec- 
tive of a single machine, while collapsing all other machines into a nondetermin- 
istically behaving environment. Consider our example from Sect. 2 for p = 0. It 
fails to converge since Line 7 generates an abstract state 5 that features a DONE 
event followed by a PRIME event in the Receiver’s queue. A light-weight static 
analysis proves that the Sender’s machine permits no path from the send DONE 
to the send PRIME statement. Since every concretization of 5 features a DONE 
followed by a PRIME event, the abstract state 5 is spurious and can be eliminated. 
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Our tool assists users in discovering candidate machine invariants, by facili- 
tating the inspection of states in T\ Ry (which foil the test in Line 8). We dis- 
charge such invariants separately, via a simple sequential model-check or static 
analysis. In the section we focus on the more interesting question of how to use 
them. Formally, suppose the P program comes with a queue invariant I, i.e. an 
invariant property of concrete queues. The abstract invariant checking problem 
is to decide, for a given abstract queue Q, whether every concretization of Q 
violates J; in this case, and this case only, an abstract state containing Q can be 
eliminated. In the following we define a language QuTL for specifying concrete 
queue invariants (5.1), and then show how checking an abstract queue against a 
QuTL invariant can be efficiently solved as a model checking problem (5.2). 


5.1 Queue Temporal Logic (QuTL) 


Our logic to express invariant properties of queues is a form of first-order linear- 
time temporal logic. This choice is motivated by the logic’s ability to constrain 
the order (via temporal operators) and multiplicity of queue events, the latter via 
relational operators that express conditions on the number of event occurrences. 


Queue Relational Expressions (QuRelE). These are of the form #e > c, where 
e € X (queue alphabet), > € {<,<,=,>,>}, and c € N is a literal natural 
number. The value of a QuRelE is defined as the Boolean 


V(#erc) = LiEeN:0<i<|QlA Oi] =e} > c (6) 


where |-| denotes set cardinality and > is interpreted as the standard integer arith- 
metic relational operator. In the following we write Q[i —] (read: “Q from i”) 
for the queue obtained from queue Q by dropping the first 7 events. 


Definition 6 (Syntax of QuTL). The following are QuTL formulas: 


— false and true. 
—e, fore E€ X. 
— E, for a queue relational expression E. 


—~X¢, Fd, Gd, for a QuTL formula ¢. 
The set QuTL is the Boolean closure of the above set of formulas. 


Definition 7 (Concrete semantics of QuTL). Concrete queue Q satisfies 
QuTL formula ¢, written Q = ¢, depending on the form of @ as follows. 


- OF true. 

- fore € X, OF e iff |2| > 0 and Q[0] = e. 

— for a queue relational expression E, Q = E iff V (E) = true. 
- OF Xo iff |Q| > 0 and Q|1 >] E ¢. 

- OF F@¢ iff there exists i € N such that 0 < i < |Q| and Qļi >| 
- ODE Go iff for alli E€ N such that 0 <i < |Q|, Ofi =] = ¢. 


l 
S 
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Satisfaction of Boolean combinations is defined as usual, e.g. Q = 7d iffOK o. 
No other pair (Q,) satisfies Q = ¢. 


For instance, formula #e < 3 is true exactly for queues containing at most 
3 e’s, and formula G(#e > 1) is true of Q iff Q is empty or its final event (!) is e. 
See App. B of [23] for more examples. 

Algorithmically checking whether a concrete queue Q satisfies a QuTL formula 
ois straightforward, since Q is of fixed size and straight-line. The situation is differ- 
ent with abstract queues. Our motivation here is to declare that an abstract queue 
Q violates a formula ¢ if all its concretizations (Definition 3) do: under this condi- 
tion, if œ is an invariant, we know Q is not reachable. Equivalently: 


Definition 8 (Abstract semantics of QuTL). Abstract queue Q satisfies 
QuTL formula ¢, written Q =p ¢, if some concretization of Q satisfies d: 


o = 3QVE%,(Q):IE¢. (7) 


For example, we have bb| ba Hə G(a = G ~b) since for instance bbba € y2(bb| 
ba) satisfies the formula. See App. B of [23] for more examples. 


{b} {b} È z ih 


Fig. 2. LTS for Q = bb | abe (p = 2), with label sets written below each state. The 
blue and red parts encode the concretizations of the prefix and suffix of Q, resp. (Color 
figure online) 


5.2 Abstract QuTL Model Checking 


A QuTL constraint is a QuTL formula without Boolean connectives. We first 
describe how to model check against QuTL constraints, and come back to 
Boolean connectives at the end of Sect. 5.2. 

Model checking an abstract queue Q against a QuTL constraint ¢, i.e. check- 
ing whether some concretization of Q satisfies ¢, can be reduced to a standard 
model checking problem over a labeled transition system (LTS) M = (5,T, L) 
with states S, transitions T, and a labeling function L: S — 27 U {e}. The LTS 
characterizes the concretization 7)(Q) of Q, as illustrated in Fig.2 using an 
example: the concretizations of Q are formed from the regular-expression traces 
generated by paths of Q’s LTS that end in the double-circled green state. 
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The straightforward construction of the LTS M is formalized in App. A.2 
of [23]. Its size is linear in |Q|: |S| = p+2x(|Q|—p)+1 and |T| = p+4x (|Q|—p). 

We call a path through M complete if it ends in the right-most state s, of M 
(green in Fig. 2). The labeling function extends to paths via L(s; > ... — sj) = 
L(s;)-...+ L(s;). This gives rise to the following characterization of yp(Q): 
Lemma 9. Given abstract queue Q over alphabet X, let M = (S,T,L) be 
its LTS. 


U {L(L(m)) € 27" | x is a complete path from so in M}. (8) 
We say path m satisfies ¢, written 7 =p @, if there exists Q € L(L(7)) 
s.t. QE @. 


Corollary 10. Let Q and M as in Lemma9, and ¢ a QuTL constraint. Then 
the following are equivalent. 


bolag: 


2. There exists a complete path n from so in M such that Tt =p o. 


Proof. immediate from Definition 8 and Lemma 9. 


Given an abstract queue Q, its LTS M, and a QuTL constraint ¢, our abstract 
queue model checking algorithm is based on Corollary 10: we need to find a 
complete path from so in M that satisfies ¢. This is similar to standard model 
checking against existential temporal logics like ECTL, with two particularities: 

First, paths must be complete. This poses no difficulty, as completeness is 
suffix-closed: a path ends in s, iff any suffix does. This implies that temporal 
reductions on QuTL constraints work like in standard temporal logics. For exam- 
ple: there exists a complete path 7 from so in M such that 7 p A iff there 
exists a complete path 7’ from some successor sı of sọ such that m =p ¢. 

Second, we have domain-specific atomic (non-temporal) propositions. These 
are accommodated as follows, for an arbitrary start state s € S: 


: m from s complete and 7 p e (for ee X): 
is is true iff e € L(s), as is immediate from the Q F e case in Definition 7. 
m: t from s complete and 7 —, #e > c (for e € X,c € N): this is true 
iff 
— the number of states reachable from s labeled e is greater than c, or 
— there exists a state reachable from s labeled with e that has a self-loop. 
The other relational expressions #e > c are checked similarly. 


en | EL) 
E 


Boolean Connectives. Let now ¢ be a full-fledged QuTL formula. We first bring 
it into negation normal form, by pushing negations inside, exploiting the usual 
dualities ~X = X73, ~F = G~, and ~G = F-. The subset > € {<,<,>,>} of 
the queue relational expressions is semantically closed under negation; “a=” is 
replaced by “> V <”. A path a from s satisfies se (for e € X) iff L(s) 4 {e}: 
this condition states that either L(s) = £, or there exists some label other than 


e in L(s), so the existential property ~e holds. 
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Disjunctions are handled by distributing |p over them: Q |p ¢1 V $2 iff 
Q Hp o1 V Q Hp 2. What remains are conjunctions. The existential flavor 
of =p implies that p does not distribute over them; see Ex. 13 in App. B.1 
of [23]. Suppose we ignore this and replace a check of the form Q |p ¢1 A ¢2 
by the weaker check Q Fp Q1 A Q =p 62, which may produce false positives. 
Now consider how we use these results: if Q Hp @ holds, we decide to keep the 
state containing the abstract queue. False positives during abstract model checks 
therefore may create extra work, but do not introduce unsoundness. In summary, 
our abstract model checking algorithm soundly approximates conjunctions, but 
remains exact for the purely disjunctive fragment of QuTL. 


Table 1. Results: #M: #P machines; Loc: #lines of code; Safe? = J: property holds; 
p: minimum unabstracted prefix for required convergence; kmax: point of convergence or 
exposed bugs (— means divergence); Time: runtime (sec); Mem.: memory usage (Mb.). 


1D/Program Program Features PaT 1D/Program Program Features PAT 

#M Loc Safe? P Kuasi Time Mem. #M Loc Safe? Po Bua Time Mem. 
1/GERMAN-1 3 242 Pi 4 TO — 8/FAILOVER 4 132 Ma 0 2 2.91 8.56 
2/GERMAN-2 4 24 s 4 = TO — 9/MAXINSTANCES 4 79 v 0 3 0.14 0.56 
3/TOKENRING-BUGGY 6 164 x 0 2 241.44 35.96 10/PINGPoNG 2 76 v 0 2 0.06 0.43 
4 /TOKENRING-FIXED 6 164 v 0 4 1849.25 130.87 11/BouNDEDASYNC 4 96 v 0 5 203.39 29.32 
5/FAILUREDETECTOR 6 229 v 0 A 183.99 12.38 12/PINGFLOOD 2 52 "i 4 5 0.11 0.43 
6/OSR 5 378 v 0 5 77.92 44.86 13/ELEVATOR-BUGGY 4 270 x 0 1 1.29 5.23 
7/OPENWSN 6 294 v 2 5 2574.25 376.29 14/ELEVATOR-FIXED 4 271 Fi 0 4 49.23 45.36 


6 Empirical Evaluation 


We implemented the proposed approaches in C# atop the bounded model 
checker PTester [11], an analysis tool for P programs. PTester employs a bounded 
exploration strategy similar to Zing [4]. We denote by PAT the implementation 
of Algorithm 1, and by PAT+I the version with queue invariants (“PAT+ Invari- 
ants”). A detailed introduction to tool design and implementation is available 
online [22]. 


Experimental Goals. We evaluate the approaches against the following questions: 


Q1. Is PAT effective: does it converge for many programs? for what values 
of k? 
Q2. What is the impact of the QuTL invariant checking? 


Experimental Setup. We collected a set of P programs (available online [22]); 
most have been used in previous publications: 


1—5: protocols implemented in P: the German Cache Coherence protocol with 
different number of clients (1—2) [11], a buggy version of a token ring 
protocol [11], and a fixed version (3—4), and a failure detector protocol 
from [25] (5). 
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6—7: two device drivers where OSR is used for testing USB devices [10]. 
8-14: miscellaneous: 8—10 [25], 11 [15], 12 is the example from Sect. 2, 13-14 
are the buggy and fixed versions of an Elevator controller [11]. 


We conduct two types of experiments: (i) we run PAT on each benchmark to 
empirically answer Q1; (ii) we run PAT+I on the examples which fail to verify 
in (i) to answer Q2. All experiments are performed on a 2.80 GHz Intel(R) 
Core(TM) i7-7600 machine with 8 GB memory, running 64-bit Windows 10. 
The timeout is set to 3600 s (1h); the memory limit to 4 GB. 


Results. Table1 shows that PAT converges on almost all safe examples (and 
successfully exposes the bugs for unsafe ones). Second, in most cases, the kmax 
where convergence was detected is small, 5 or less. This is what enables the use 
of this technique in practice: the exploration space grows fast with k, so early 
convergence is critical. Note that kmax is guaranteed to be the smallest value for 
which the respective example converges. If convergent, the verification succeeded 
fully automatically: the queue abstraction prefix parameter p is incremented in 
a loop whenever the current value of p caused a spurious abstract state. 

The GERMAN protocol does not converge in reasonable time. In this case, we 
request minimal manual assistance from the designer. Our tool inspects spurious 
abstract states, compares them to actually reached abstract states, and suggests 
candidate invariants to exclude them. We describe the process of invariant dis- 
covery, and why and how they are easy to prove, in [22]. 

The following table shows the invariants that make the GERMAN protocol 
converge, and the resulting times and memory consumption. 


Program p| kmax |Time | Mem. Invariant 
GERMAN-1 | 0) 4 15.65 45.65 | Server: #req_excl < 1 A #req_share < 1 
GERMAN-2 0| 4 629.43 | 284.75 Client: #ask_excl < 1 A #ask_share < 1 


The invariant states that there is always at most one exclusive request and 
at most one shared request in the Server or Client machine’s queue. 


Performance Evaluation. We finally consider the following question: To perform 
full verification, how much overhead does PAT incur compared to PTester? We 
iteratively run PTester with a queue bound from 1 up to kmax (from Table 1). 
The figure on the right = = {[erteter erat N] 
compares the running times : E — j 
of PAT and PTester. We 
observe that the difference is ¢ 
small, in all cases, suggesting = 
that turning PTester into a 
full verifier comes with little 


— Ea 
TI 


Benchmark ID 
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extra cost. Therefore, as for improving PAT’s scalability, the focus should be 
on the efficiency of the Rẹ computation (Line 3 in Algorithm 1). Techniques 
that lend themselves here are partial order reduction [2,28] or symmetry reduc- 
tion [29]. Note that our proposed approach is orthogonal to how these sets are 
computed. 


7 Related Work 


Automatic verification for asynchronous event-driven programs communicating 
via unbounded FIFO queues is undecidable [8], even when the agents are finite- 
state machines. To sidestep the undecidability, various remedies are proposed. 
One is to underapproximate program behaviors using various bounding tech- 
niques; examples include depth- [17] and context-bounded analysis [19, 20, 26], 
delay-bounding [13], bounded asynchrony [15], preemption-bounding [24], and 
phase-bounded analysis [3,6]. It has been shown that most of these bounding 
techniques admit a decidable model checking problem [19, 20,26] and thus have 
been successfully used in practice for finding bugs. 

Gall et al. proposed an abstract interpretation of FIFO queues in terms of 
regular languages [16]. While our works share some basic insights about taming 
queues, the differences are fundamental: our abstract domain is finite, guaran- 
teeing convergence of our sequence. In [16] the abstract domain is infinite; they 
propose a widening operator for fixpoint computation. More critically, we use 
the abstract domain only for convergence detection; the set of reachable states 
returned is in the end exact. As a result, we can prove and refute properties but 
may not terminate; [16] is inexact and cannot refute but always returns. 

Several partial verification approaches for asynchronous message-passing pro- 
grams have been presented recently [5,7,10]. In [5], Bakst et al. propose canon- 
ical sequentialization, which avoids exploring all interleavings by sequentializing 
concurrent programs. Desai et al. [10] propose an alternative way, namely by pri- 
oritizing receive actions over send actions. The approach is complete in the sense 
that it is able to construct almost-synchronous invariants that cover all reach- 
able local states and hence suffice to prove local assertions. Similarly, Bouajjani 
et al. [7] propose an iterative analysis that bounds send actions in each interac- 
tion phase. It approaches the completeness by checking a program’s synchroniz- 
ability under the bounds. Similar to our work, the above three works are sound 
but incomplete. An experimental comparison against the techniques reported in 
(7, 10] fails due to the unavailability of a tool that implements them. While tools 
implementing these techniques are not available [7,10], a comparison based on 
what is reported in the papers suggests that our approach is competitive in both 
performance and precision. 

Our approach can be categorized as a cutoff detection technique [1, 12, 14, 28]. 
Cutoffs are, however, typically determined statically, often leaving them too large 
for practical verification. Aiming at minimal cutoffs, our work is closer in nature 
to earlier dynamic strategies [18,21], which targeted different forms of concurrent 
programs. The generator technique proposed in [21] is unlikely to work for P 
programs, due to the large local state space of machines. 
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8 Conclusion 


We have presented a method to verify safety properties of asynchronous event- 
driven programs of agents communicating via unbounded queues. Our approach 
is sound but incomplete: it can both prove (or, by encountering bugs, disprove) 
such properties but may not terminate. We empirically evaluate our method on 
a collection of P programs. Our experimental results showcase our method can 
successfully prove the correctness of programs; such proof is achieved with little 
extra resource costs compared to plain state exploration. Future work includes 
an extension to P programs with other sources of unboundedness than the queue 
length (e.g. messages with integer payloads). 


Acknowledgments. We thank Dr. Vijay D’Silva (Google, Inc.), for enlightening dis- 
cussions about partial abstract transformers. 


References 


1. Abdulla, A.P., Haziza, F., Holík, L.: All for the price of few (parameterized verifi- 
cation through view abstraction). In: VMCAI, pp. 476-495 (2013) 

2. Abdulla, P., Aronis, S., Jonsson, B., Sagonas, K.: Optimal dynamic partial order 
reduction. In: POPL, pp. 373-384 (2014) 

3. Abdulla, P.A., Atig, M.F., Cederberg, J.: Analysis of message passing programs 
using SMT-solvers. In: Van Hung, D., Ogawa, M. (eds.) ATVA 2013. LNCS, vol. 
8172, pp. 272-286. Springer, Cham (2013). https://doi.org/10.1007/978-3-319- 
02444-8_20 

4. Andrews, T., Qadeer, S., Rajamani, S.K., Rehof, J., Xie, Y.: Zing: a model checker 
for concurrent software. In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 
3114, pp. 484-487. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3- 
540-27813-9_42 

5. Bakst, A., Gleissenthall, K.v., Kici, R.G., Jhala, R.: Verifying distributed programs 
via canonical sequentialization. PACMPL 1(OOPSLA), 110:1-110:27 (2017) 

6. Bouajjani, A., Emmi, M.: Bounded phase analysis of message-passing programs. 
Int. J. Softw. Tools Technol. Transf. 16(2), 127-146 (2014) 

7. Bouajjani, A., Enea, C., Ji, K., Qadeer, S.: On the completeness of verifying mes- 
sage passing programs under bounded asynchrony. In: Chockler, H., Weissenbacher, 
G. (eds.) CAV 2018. LNCS, vol. 10982, pp. 372-391. Springer, Cham (2018). 
https: //doi.org/10.1007/978-3-319-96142-2_23 

8. Brand, D., Zafiropulo, P.: On communicating finite-state machines. J. ACM 30(2), 
323-342 (1983) 

9. Cousot, P., Cousot, R.: Systematic design of program analysis frameworks. In: 
POPL, pp. 269-282 (1979) 

10. Desai, A., Garg, P., Madhusudan, P.: Natural proofs for asynchronous programs 
using almost-synchronous reductions. In: OOPSLA, pp. 709-725 (2014) 

11. Desai, A., Gupta, V., Jackson, E., Qadeer, S., Rajamani, S., Zufferey, D.: P: safe 
asynchronous event-driven programming. In: PLDI, pp. 321-332 (2013) 

12. Emerson, E.A., Kahlon, V.: Reducing model checking of the many to the few. In: 
McAllester, D. (ed.) CADE 2000. LNCS (LNAI), vol. 1831, pp. 236-254. Springer, 
Heidelberg (2000). https://doi.org/10.1007/10721959_19 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


Verifying AED Programs Using Partial Abstract Transformers 403 


Emmi, M., Qadeer, S., Rakamarić, Z.: Delay-bounded scheduling. In: POPL, pp. 
411-422 (2011) 

Farzan, A., Kincaid, Z., Podelski, A.: Proof spaces for unbounded parallelism. In: 
POPL, pp. 407-420 (2015) 

Fisher, J., Henzinger, T.A., Mateescu, M., Piterman, N.: Bounded asynchrony: 
concurrency for modeling cell-cell interactions. In: Fisher, J. (ed.) FMSB 2008. 
LNCS, vol. 5054, pp. 17-32. Springer, Heidelberg (2008). https://doi.org/10.1007/ 
978-3-540-68413-8_2 

Le Gall, T., Jeannet, B., Jéron, T.: Verification of communication protocols using 
abstract interpretation of FIFO queues. In: Johnson, M., Vene, V. (eds.) AMAST 
2006. LNCS, vol. 4019, pp. 204-219. Springer, Heidelberg (2006). https://doi.org/ 
10.1007 /11784180_17 

Godefroid, P.: Model checking for programming languages using VeriSoft. In: 
POPL, pp. 174-186 (1997) 

Kaiser, A., Kroening, D., Wahl, T.: Dynamic cutoff detection in parameterized 
concurrent programs. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, 
vol. 6174, pp. 645-659. Springer, Heidelberg (2010). https: //doi.org/10.1007/978- 
3-642-14295-6_55 

La Torre, S., Parthasarathy, M., Parlato, G.: Analyzing recursive programs using 
a fixed-point calculus. In: PLDI, pp. 211-222 (2009) 

Lal, A., Reps, T.: Reducing concurrent analysis under a context bound to sequential 
analysis. Form. Methods Syst. Des. 35(1), 73-97 (2009) 

Liu, P., Wahl, T.: CUBA: interprocedural context-unbounded analysis of concur- 
rent programs. In: PLDI, pp. 105-119 (2018) 

Liu, P., Wahl, T., Lal, A.: (2019). www.khoury.northeastern.edu/home/Ipzun/ 
quba 

Liu, P., Wahl, T., Lal, A.: Verifying asynchronous event-driven programs using 
partial abstract transformers (extended manuscript). CoRR abs/1905.09996 (2019) 
Musuvathi, M., Qadeer, S.: Iterative context bounding for systematic testing of 
multithreaded programs. In: PLDI, pp. 446-455 (2007) 

P-GitHub: The P programming langugage (2019). https: //github.com/p-org/P 
Qadeer, S., Rehof, J.: Context-bounded model checking of concurrent software. 
In: Halbwachs, N., Zuck, L.D. (eds.) TACAS 2005. LNCS, vol. 3440, pp. 93-107. 
Springer, Heidelberg (2005). https: //doi.org/10.1007/978-3-540-31980-1_7 

Reps, T., Sagiv, M., Yorsh, G.: Symbolic implementation of the best transformer. 
In: Steffen, B., Levi, G. (eds.) VMCAI 2004. LNCS, vol. 2937, pp. 252-266. 
Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24622-0_21 
Sousa, M., Rodriguez, C., D’Silva, V., Kroening, D.: Abstract interpretation with 
unfoldings. In: Majumdar, R., Kunéak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 
197-216. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63390-9_11 
Wahl, T., Donaldson, A.: Replication and abstraction: symmetry in automated 
formal verification. Symmetry 2(2), 799-847 (2010) 


404 P. Liu et al. 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the 
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter’s Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 


Inferring Inductive Invariants 
from Phase Structures 


Yotam M. Y. Feldman!®), James R. Wilcox?, 
Sharon Shoham!, and Mooly Sagiv! 


' Tel Aviv University, Tel Aviv, Israel 
yotam. feldman@gmail.com 
> University of Washington, Seattle, USA 


Abstract. Infinite-state systems such as distributed protocols are challenging to 
verify using interactive theorem provers or automatic verification tools. Of these 
techniques, deductive verification is highly expressive but requires the user to 
annotate the system with inductive invariants. To relieve the user from this labor- 
intensive and challenging task, invariant inference aims to find inductive invari- 
ants automatically. Unfortunately, when applied to infinite-state systems such as 
distributed protocols, existing inference techniques often diverge, which limits 
their applicability. 

This paper proposes user-guided invariant inference based on phase invari- 
ants, which capture the different logical phases of the protocol. Users conveys 
their intuition by specifying a phase structure, an automaton with edges labeled 
by program transitions; the tool automatically infers assertions that hold in the 
automaton’s states, resulting in a full safety proof. The additional structure from 
phases guides the inference procedure towards finding an invariant. 

Our results show that user guidance by phase structures facilitates successful 
inference beyond the state of the art. We find that phase structures are pleas- 
antly well matched to the intuitive reasoning routinely used by domain experts 
to understand why distributed protocols are correct, so that providing a phase 
structure reuses this existing intuition. 


1 Introduction 


Infinite-state systems such as distributed protocols remain challenging to verify despite 
decades of work developing interactive and automated proof techniques. Such proofs 
rely on the fundamental notion of an inductive invariant. Unfortunately, specifying 
inductive invariants is difficult for users, who must often repeatedly iterate through 
candidate invariants before achieving an inductive invariant. For example, the Verdi 
project’s proof of the Raft consensus protocol used an inductive invariant with 90 con- 
juncts and relied on significant manual proof effort [61,62]. 

The dream of invariant inference is that users would instead be assisted by auto- 
matic procedures that could infer the required invariants. While other domains have 
seen successful applications of invariant inference, using techniques such as abstract 
interpretation [18] and property-directed reachability [10,21], existing inference tech- 
niques fall short for interesting distributed protocols, and often diverge while searching 
for an invariant. These limitations have hindered adoption of invariant inference. 


© The Author(s) 2019 
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 405—425, 2019. 
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Our Approach. The idea of this paper is that invariant inference can be made dras- 
tically more effective by utilizing user-guidance in the form of phase structures. We 
propose user-guided invariant inference, in which the user provides some additional 
information to guide the tool towards an invariant. An effective guidance method must 
(1) match users’ high-level intuition of the proof, and (2) convey information in a way 
that an automatic inference tool can readily utilize to direct the search. In this setting 
invariant inference turns a partial, high-level argument accessible to the user into a full, 
formal correctness proof, overcoming scenarios where procuring the proof completely 
automatically is unsuccessful. 

Our approach places phase invariants at the heart of both user interaction and algo- 
rithmic inference. Phase invariants have an automaton-based form that is well-suited to 
the domain of distributed protocols. They allow the user to convey a high-level tempo- 
ral intuition of why the protocol is correct in the form of a phase structure. The phase 
structure provides hints that direct the search and allow a more targeted generalization 
of states to invariants, which can facilitate inference where it is otherwise impossible. 

This paper makes the following contributions: 


(1) We present phase invariants, an automaton-based form of safety proofs, based on 
the distinct logical phases of a certain view of the system. Phase invariants closely 
match the way domain experts already think about the correctness of distributed 
protocols by state-machine refinement a la Lamport [e.g. 43]. 

(2) We describe an algorithm for inferring inductive phase invariants from phase struc- 
tures. The decomposition to phases through the phase structure guides inference 
towards finding an invariant. The algorithm finds a proof over the phase structure 
or explains why no such proof exists. In this way, phase invariants facilitate user 
interaction with the algorithm. 

(3) Our algorithm reduces the problem of inferring inductive phase invariants from 
phase structures to the problem of solving a linear system of Constrained Horn 
Clauses (CHC), irrespective of the inference technique and the logic used. In the 
case of universally quantified phase inductive invariants for protocols modeled in 
EPR (motivated by previous deductive approaches [50,51,60]), we show how to 
solve the resulting CHC using a variant of PDRY [40]. 

(4) We apply this approach to the inference of invariants for several interesting dis- 
tributed protocols. (This is the first time invariant inference is applied to distributed 
protocols modeled in EPR.) In the examples considered by our evaluation, trans- 
forming our high-level intuition about the protocol into a phase structure was rela- 
tively straightforward. The phase structures allowed our algorithm to outperform in 
most cases an implementation of PDR” that does not exploit such structure, facili- 
tating invariant inference on examples beyond the state of the art and attaining faster 
convergence. 


Overall, our approach demonstrates that the seemingly inherent intractability of sift- 
ing through a vast space of candidate invariants can be mitigated by leveraging users’ 
high-level intuition. 
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2 Preliminaries 


In this section we provide background on first-order transition systems. Sorts are omit- 
ted for simplicity. Our results extend also to logics with a background theory. 


Notation. FV(p) denotes the set of free variables of y. Fs(V) denotes the set of first- 
order formulas over vocabulary X with FV(y) C V. We write VV. p ==> y to denote 
that the formula VV. py — y is valid. We sometimes use fa as a shorthand for f(a). 


Transition Systems. We represent transition systems symbolically, via formulas in first- 
order logic. The definitions are standard. A vocabulary X consisting of constant, func- 
tion, and relation symbols is used to represent states. Post-states of transitions are rep- 
resented by a copy of X denoted ©” = {a’ | a € X}. A first-order transition system 
over X is a tuple TS = (Init, TR), where Init € Fy() describes the initial states, and 
TR € Fs,(0) with £ = X w SX” describes the transition relation. The states of TS are 
first-order structures over X. A state s is initial if s | Jnit. A transition of TS is a 
pair of states s1, S2 over a shared domain such that (s1, s2) = TR, (s1, 82) being the 
structure over that domain in which X in interpreted as in sı and X” as in s2. sı is also 
called the pre-state and sz the post-state. Traces are finite sequences of states 01, 02,... 
starting from an initial state such that there is a transition between each pair of consec- 
utive states. The reachable states are those that reside on traces starting from an initial 
state. 


Safety. A safety property P is a formula in Fs (0). We say that TS is safe, and that P is 
an invariant, if all the reachable states satisfy P. Inv € Fy(Q) is an inductive invariant 
if (i) Init —> Inv (initiation), and (ii) Init \ TR => Inv’ (consecution), where Inv’ is 
obtained from Inv by replacing each symbol from X with its primed counterpart. If also 
(iii) Inv => P (safety), then it follows that TS is safe. 


3 Running Example: Distributed Key- Value Store 


We begin with a description of the running example we refer to throughout the paper. 

The sharded key-value store with retransmissions (KV-R), adapted from Iron- 
Fleet [33, §5.2.1], is a distributed hash table where each node owns a subset of the 
keys, and keys can be dynamically transferred among nodes to balance load. The safety 
property ensures that each key is globally associated with one value, even in the pres- 
ence of key transfers. Messages might be dropped by the network, and the protocol uses 
retransmissions and sequence numbers to maintain availability and safety. 

Figure | shows code modeling the protocol in a relational first-order language akin 
to Ivy [45], which compiles to EPR transition systems. The state of nodes and the 
network is modeled by global relations. Lines | to 4 declare uninterpreted sorts for keys, 
values, clients, and sequence numbers. Lines 6 to 14 describe the state, consisting of: (i) 
local state of clients pertaining to the table (which nodes are owners of which keys, and 
the local shard of the table mapping keys to values); (ii) local state of clients pertaining 
to sent and received messages (seqnum_sent, unacked, seqnum_recvd); and (iii) the state 
of the network, comprised of two kinds of messages (transfer_msg, ack_msg). Each 
message kind is modeled as a relation whose first two arguments indicate the source 
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1 type key 
2 type value 
3 type node à 
YP 39 action recv_transfer_msg(src:node, n:node, 
4 type sequnum 
Š 40 k:key, v:value, s:seqnum) 
é relation owner: node, ‘key 41 require transfer_msg(src, n, k, v, s) 
F 42 Ansegnum_recvd (n; src, s) 
7 relation table: node, key, value 
7 43 seqnum_recvd(n, src, s) := true 
8 relation transfer_msg: node, node, 
5 ke value: Seanu 44 table(n, k, v) := true 
Yr x q 45 owner (n, k) := true 


10 relation ack_msg: node, node, seqnum 
11 relation seqnum_sent: node, seqnum 


å 47 action send_ack (src:node, n:node 
12 relation unacked: node, node, =i f E 


48 k:key, v:value, s:seqnum) 
13 key, value, seqnum ` 
š 49 require transfer_msg(src, n, k, v, s) 
14 relation seqnum_recvd: node, node, seqnum 
i$ 50 Aseqnum_recvd(n, src, s) 
51 ack_msg (src, n, s) := true 


16 init Vni,n2,k. owner (n1, k) Aowner (n2, k) 
17 > nı = n2 


53 i k, 4 : 
18 init // all other relations are empty 3 action drop ack msg (sfe noder dst:node, 


is 54 k:key, s:seqnum) 
55 i ‘ 
20 action reshard(n_old:node, n_new:node, : require ack msg (sfe, dst, s) 
56 ack_msg (src, dst, s) := false 
21 k:key, value:sequnum) 
2 require table(n_old, k, v) 7 
= PSE 58 action recv_ack_msg(src:node, dst:node, 
23 Anseqnum_sent (n_old, s) 
59 k:key, s:seqnum) 
24 seqnum_sent (n_old, s) := true D zegülte ack msg (sro; dst, S) 
25 table(n_old, k, v) := false a nS. , r 
h 61 unacked (src, dst, *, *, s) := false 
26 owner(n_old, k) := false s 
27 f 1l k = P 
transfer_msg(n_old, n_new, k, v, s) trug 63 action put (n:node, k:key, v:value) 
28 unacked(n_old, n_new, k, v, s) := true F 
35 64 require owner (n, k) 
65 bl k := fal 
30 action drop_transfer_msg(src:node, dst:node, table(n, k, *) a ge 
66 table(n, k, v) := true 
31 k:key, v:ivalue, s:seqnum) ef 
32 require transfer_msg(src, dst, k, v, s) ‘ 
33 transfer_msg(src, dst, k, v, s) := false s safety Vk, nı, n2, 01, v2. 


69 table(ni,k,v1) A 


: F 70 table (n2,k,v > 
35 action retransmit (src:node, dst:node, (n2, k, va) 


7 ny = n2 AV =U 
36 k:key, v:value, s:seqnum) . 3 ia 2 
37 require unacked(src, dst, k, v, s) 
38 transfer_msg(src, dst, k, v, s) := true 


Fig. 1. Sharded key-value store with retransmissions (KV-R) in a first-order relational modeling. 


and destination of the message, and the rest carry the message’s payload. For example, 
ack_msg is a relation over two nodes and a sequence number, with the intended meaning 
that a tuple (c1, c2, s) is in ack_msg exactly when there is a message in the network from 
cı to c2 acknowledging a message with sequence number s. 

The initial states are specified in Lines 17 to 18. Transitions are specified by the 
actions declared in Lines 20 to 66. Actions can fire nondeterministically at any time when 
their precondition (require statements) holds. Hence, the transition relation comprises 
of the disjunction of the transition relations induced by the actions. The state is mutated 
by modifying the relations. For example, message sends are modeled by inserting a tuple 
into the corresponding relation (e.g. line 27), while message receives are modeled by 
requiring a tuple to be in the relation (e.g. line 32), and then removing it (e.g. line 33). 
The updates in lines 61 and 65 remove a set of tuples matching the pattern. 

Transferring keys between nodes begins by sending a transfer_msg from the owner 
to a new node (line 20), which stores the key-value pair when it receives the message 
(line 39). Upon sending a transfer message the original node cedes ownership (line 26) 
and does not send new transfer messages. Transfer messages may be dropped (line 30). 
To ensure that the key-value pair is not lost, retransmissions are performed (line 35) with 
the same sequence number until the target node acknowledges (which occurs in line 
47). Acknowledge messages themselves may be dropped (line 53). Sequence numbers 
protect from delayed transfer messages, which might contain old values (line 42). 
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Lines 68 to 71 specify the key safety property: at most one value is associated with 
any key, anywhere in the network. Intuitively, the protocol satisfies this because each 
key k is either currently (1) owned by a node, in which case this node is unique, or 
(2) it is in the process of transferring between nodes, in which case the careful use of 
sequence numbers ensures that the destination of the key is unique. As is typical, it is 
not straightforward to translate this intuition into a full correctness proof. In particular, 
it is necessary to relate all the different components of the state, including clients’ local 
state and pending messages. 

Invariant inference strives to automatically find an inductive invariant establish- 
ing safety. This example is challenging for existing inference techniques (Sect. 6). This 
paper proposes user-guided invariant inference based on phase-invariants to overcome 
this challenge. The rest of the paper describes our approach, in which inference is pro- 
vided with the phase structure in Fig. 2, matching the high level intuitive explanation 
above. The algorithm then automatically infers facts about each phase to obtain an 
inductive invariant. Sect.4 describes phase structures and inductive phase invariants, 
and Sect. 5 explains how these are used in user-guided invariant inference. 


4 Phase Structures and Invariants 


In this section we introduce phase structures and inductive phase invariants. These are 
used for guiding automatic invariant inference in Sect. 5. Proofs appear in [24]. 


4.1 Phase Invariants 


Definition 1 (Quantified Phase Automaton). A quantified phase automaton (phase 
automaton for short) over X is a tuple A = (Q,t,V,6,~) where: Q is a finite set 
of phases. 1 € Q is the initial phase. V is a set of variables, called the automaton’s 
quantifiers. ô : Q x Q — Fs(V) is a function labeling every pair of phases by a 
transition relation formula, such that FV(6(q,p)) © V for every (q,p) E Q x Dy: 
Q — Fs(V) is a function labeling every phase by a phase characterization formula, 
s.t. FV(p4) C V for every q E€ Q. 


Intuitively, V should be understood as free variables that are implicitly universally 
quantified outside of the automaton’s scope. For each assignment to these variables, 
the automaton represents the progress along the phases from the point of view of this 
assignment, and thus VY is also called the view (or view quantifiers). 

We refer to (Q, ¿, V, 8), where y is omitted, as the phase structure (or the automaton 
structure) of A. We refer by the edges of A to R = { (q, p) € Q x Q | diq, p) F false}. 
A trace of A is a sequence of phases qo, .. . , qn such that go = + and (qi, qi41) E R 
for every 0 < i < n. We say that A is deterministic if for every (q, p1), (q, p2) € R s.t. 


Pı # p2, the formula 6(4,p,) A 4, is unsatisfiable. 


q,pı q,p2) 


Example 1. Figure 2 shows a phase automaton for the running example, with the view 
of a single key k. It describes the protocol as transitioning between two distinct (logical) 
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drop_transfer_msg(«,*,k,*,*) reshard(*,«,k,*) drop_transfer_msg(x,*,k,*,*) 
retransmit (*,*,k,*,*) retransmit (*,*,k,*,*) 
send_ack (*,*,k,+*,*) send_ack (*,*,k,*,*) 
drop_ack_msg(+*,*,k,*) drop_ack_msg(«,*,k,*) 
recv_ack_msg(+*,*,k,*) recv_ack_msg(x,*,k,*) 
put (*,k,*«) put (*,k,*) 


recv_transfer_msg(*,*,k,x«,*) 


m phase O[k]: 


B invariant Vn, nz. owner (n1, k) A^Aowner (n2, k)—> ny = ng 

14 invariant Vn,v. table(n,k,v) owner (n, k) 

75 invariant Vsrc,dst, v, s. 4(transfer_msg (src, dst, k, v, s) Anseqnum_recvd (dst, src, s) ) 
16 invariant Vnı,nz2,v1, V2. table (n1, k, v1)Atable (n2, k, v2)> ny = ng Av, = v2 

7 invariant Vsrc, dst, v, s. —(unacked (src, dst, k, v, s) ^~mseqnum_recvd (dst, sre, s) ) 


7 
79 phase T[k]: 


80 invariant Vn. owner (n, k) 

81 invariant Vn,v. table(n,k,v) owner (n, k) 

82 invariant Vsrcj, src2, dst;, dstg, v1, v2, $1, $2. transfer_msg (src1, dstı , k, v1, S1) Anseqnum_recvd (dst: , src, , $1) 
83 Atransfer_msg (srez, dstz, k, v2, $2) Anseqnum_recvd (dst2, srcz, $2) — (src1, dst;, v1, $1) = (src2, dstz, v2, $2) 

84 invariant Vsrcj, src2, dst;, dstg, v1, v2, $1, $2.-transfer_msg (src1 , dst), k, v1, $1) Aaseqnum_recvd (dst; , dst; , 51) 
85 Aunacked (src2, dst2, k, v2, $2) \nseqnum_recvd (dst, srcg, $2) — (src1, dsty, v1, $1) = (srceg, dst2, v2, $2) 

86 invariant Vsrcj, src2, dst;, dstg, v1, v2, $1, $2. unacked (src1 , dst}, k, v1, $1) Anseqnum_recvd (dst: , src1, $1) 

87 Aunacked (sre, dst2, k, v2, $2) \nseqnum_recvd (dst, srcg, $2) — (src1, dst;, v1, $1) = (src2, dst2, v2, $2) 


Fig. 2. Phase structure for key-value store (top) and phase characterizations (bottom). The user 
provides the phase structure, and inference automatically produces the phase characterizations, 
forming a safe inductive phase automaton. 


phases of k: owned (O[k]) and transferring (T[k]). The edges are labeled by actions of 
the system. A wildcard » means that the action is executed with an arbitrary argument. 
The two central actions are (i) reshard, which transitions from Ofk] to T[A], but cannot 
execute in T[k], and (ii) recv_transfer_message, which does the opposite. The rest 
of the actions do not cause a phase change and appear on a self loop in each phase. 
Actions related to keys other than k are considered as self-loops, and omitted here for 
brevity. Some actions are disallowed in certain phases, namely, do not label any outgo- 
ing edge from a phase, such as recv_transfer_msg(k) in O[k]. Characterizations 
for each phase are depicted in Fig. 2 (bottom). Without them, Fig. 2 represents a phase 
structure, which serves as the input to our inference algorithm. We remark that the 
choice of automaton aims to reflect the safety property of interest. In our example, one 
might instead imagine taking the view of a single node as it interacts with multiple keys, 
which might seem intuitive from the standpoint of implementing the system. However, 
it is not appropriate for the proof of value uniqueness, since keys pass in and out of the 
view of a single client. 

We now formally define phase invariants as phase automata that overapproximate 
the behaviors of the original system. 


Definition 2 (Language of Phase Automaton). Let A be a quantified phase automa- 
ton over X, and © = 09, . . . , On a finite sequence of states over X, all with domain D. 
Letv : YV — D be a valuation of the automaton quantifiers. We say that: 


v |= A if there exists a trace of phases qo,...,qn such that (0;,0;41),U = 
(qisqig1) SOF every O < i < nand oi, v H| Yq, for every0 Sicn. 
- F = AifT,v H Afor every valuation v. 


The language of A is L(A) = {0 | 7 H A}. 
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Definition 3 (Phase Invariant). A phase automaton A is a phase invariant for a tran- 
sition system TS if £(TS) C L(A), where L(TS) denotes the set of finite traces of TS. 


Example 2. The phase automaton of Fig. 2 is a phase invariant for the protocol: intu- 
itively, whenever an execution of the protocol reaches a phase, its characterizations 
hold. This fact may not be straightforward to establish. To this end we develop the 
notion of inductive phase invariants. 


4.2 Establishing Safety and Phase Invariants with Inductive Phase Invariants 
To establish phase invariants, we use inductiveness: 
Definition 4 (Inductive Phase Invariant). A is inductive w.r.t. TS = (Init, TR) if: 


Initiation: Init => (VV. p,). 
Inductiveness: for all (q,p) E R, YV. (Yq A 6(a,p) => V5): 


Edge Covering: for everyq E€ Q, VV. (pa ^ TR => Viper Sa): 


Example 3. The phase automaton in Fig. 2 is an inductive phase invariant. For example, 
the only disallowed transition in O[k] is recv_transfer_message, which indeed 
cannot execute in O[k] according to the characterization in line 75. Further, if, for 
example, a protocol’s transition from O[k] matches the labeling of the edge to T[k] 
(i.e. a reshard action on k), the post-state necessarily satisfies the characterizations of 
T[k]: for instance, the post-state satisfies the uniqueness of unreceived transfer mes- 
sages (line 82) because in the pre-state there are none (line 75). 


Lemma 1. Jf A is inductive w.rt. TS then it is a phase invariant for TS. 


Remark 1. The careful reader may notice that the inductiveness requirement is stronger 
than needed to ensure that the characterizations form a phase invariant. It could be 
weakened to require for every q E€ Q: VV. yg ATR => V aper 5(q,p) ^ Pp However, 
as we explain in Sect. 5, our notion of inductiveness is crucial for inferring inductive 
phase automata, which is the goal of this paper. Furthermore, for deterministic phase 
automata, the two requirements coincide. 


Inductive Invariants vs. Inductive Phase Invariants. Inductive invariants and inductive 
phase invariants are closely related: 


Lemma 2. /f A is inductive w.r.t. TS then VV. Vaco Pq is an inductive invariant 
for TS. If Inv is an inductive invariant for TS, then the phase automaton Amy = 
({a}, {4}, 0,8, p), where 5(q,q) = TR and (pq = Inv is an inductive phase automa- 
ton w.r.t. TS. 


In this sense, phase inductive invariants are as expressive as inductive invariants. How- 
ever, as we show in this paper, their structure can be used by a user as an intuitive way 
to guide an automatic invariant inference algorithm. 


Safe Inductive Phase Invariants. Next we show that an inductive phase invariant can 
be used to establish safety. 
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Definition 5 (Safe Phase Automaton). Let A be a phase automaton over X with 
quantifiers V. Then A is safe wrt. VV. P if YV. (pa => P) holds for every q E€ Q. 


Lemma 3. /f A is inductive w.rt. TS and safe wrt. VV. P then YV.P is an invariant 
of TS. 


5 Inference of Inductive Phase Invariants 


In this section we turn to the inference of safe inductive phase invariants over a given 
phase structure, which guides the search. Formally, the problem we target is: 


Definition 6 (Inductive Phase Invariant Inference). Given a transition system TS = 
(Init, TR), a phase structure S = (Q,t,V,6) and a safety property VV. P, all over X, 
find a safe inductive phase invariant A for TS over the phase structure S, if one exists. 


Example 4. Inference of an inductive phase invariant is provided with the phase struc- 
ture in Fig. 2, which embodies an intuitive understanding of the different phases the 
protocol undergoes (see Example 1). The algorithm automatically finds phase charac- 
terizations forming a safe inductive phase invariant over the user-provided structure. We 
note that inference is valuable even after a phase structure is provided: in the running 
example, finding an inductive phase invariant is not easy; in particular, the characteri- 
zations in Fig. 2 relate different parts of the state and involve multiple quantifiers. 


5.1 Reduction to Constrained Horn Clauses 


We view each unknown phase characterization, ¢,, which we aim to infer for every 
q € Q, as a predicate [,. The definition of a safe inductive phase invariant induces a set 
of second-order Constrained Horn Clauses (CHC) over 14: 


Initiation. Init => (VV. I,) (1) 

Inductiveness. For every (q,p)€ R: VV. (Iq ^ 5(q,p) => Tp) (2) 

Edge Covering. For every q € Q : VV. (1, ATR= VV Sam) (3) 
(Gp)ER 

Safety. For every q E€ Q : V. (L4 = P) (4) 


where V denotes the quantifiers of A. All the constraints are linear, namely at most 
one unknown predicate appears at the lefthand side of each implication. 

Constraint (4) captures the original safety requirement, whereas (3) can be under- 
stood as additional safety properties that are specified by the phase automaton (since no 
unknown predicates appear in the righthand side of the implications). 

A solution I to the CHC system associates each predicate [, with a formula pq over 
X (with FV(qq) C V) such that when yq is substituted for I4, all the constraints are 
satisfied (i.e., the corresponding first-order formulas are valid). A solution to the system 
induces a safe inductive phase automaton through characterizing each phase q by the 
interpretation of [,, and vice versa. Formally: 
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Lemma 4. Let A = (O,R,0,V,6, p) with pa = Iq. Then A is a safe inductive phase 
invariant wrt. TS and V/V. P if and only if Lis a solution to the CHC system. 


Therefore, to infer a safe inductive phase invariant over a given phase structure, we 
need to solve the corresponding CHC system. In Sect. 6.1 we explain our approach for 
doing so for the class of universally quantified phase characterizations. Note that the 
weaker definition of inductiveness discussed in Remark | would prevent the reduction 
to CHC as it would result in clauses that are not Horn clauses. 


Completeness of Inductive Phase Invariants. There are cases where a given phase 
structure induces a safe phase invariant A, but not an inductive one, making the CHC 
system unsatisfiable. However, a strengthening into an inductive phase invariant can 
always be used to prove that .A is an invariant if (i) the language of invariants is unre- 
stricted, and (ii) the phase structure is deterministic, namely, does not cover the same 
transition in two outgoing edges. Determinism of the automaton does not lose gener- 
ality in the context of safety verification since every inductive phase automaton can be 
converted to a deterministic one; non-determinism is in fact unbeneficial as it mandates 
the same state to be characterized by multiple phases (see also Remark 1). These topics 
are discussed in detail in the extended version [24]. 


Remark 2. Each phase is associated with a set of states that can reach it, where a state 
o can reach phase q if there is a sequence of program transitions that results in o and 
can lead to q according to the automaton’s transitions. This makes a phase structure 
different from a simple syntactical disjunctive template for inference, in which such 
semantic meaning is unavailable. 


5.2 Phase Structures as a Means to Guide Inference 


The search space of invariants over a phase structure is in fact larger than that of stan- 
dard inductive invariants, because each phase can be associated with different charac- 
terizations. Sometimes the disjunctive structure of the phases (Lemma 2) uncovers a 
significantly simpler invariant than exists in the syntactical class of standard inductive 
invariants explored by the algorithm, but this is not always the case.! Nonetheless, the 
search for an invariant over the structure is guided, through the following aspects: 


(1) Phase decomposition. Inference of an inductive phase invariant aims to find charac- 
terizations that overapproximate the set of states reachable in each phase (Remark 
2). The distinction between phases is most beneficial when there is a considerable 
difference between the sets associated with different phases and their characteriza- 
tions. For instance, in the running example, all states without unreceived transfer 
messages are associated with O[k], whereas all states in which such messages 
exist are associated with T [k]—a distinction captured by the characterizations in 
lines 75 and 82 in Fig. 2. 


' As an illustration, the extended version [24] includes an inductive invariant for the running 
example which is comparable in complexity to the inductive phase invariant in Fig. 2. 
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Differences between phases would have two consequences. First, since each phase 
corresponds to fewer states than all reachable states, generalization—the key ingre- 
dient in inference procedures—is more focused. The second consequence stems 
from the fact that inductive characterizations of different phases are correlated. It 
is expected that a certain property is more readily learnable in one phase, while 
related facts in other phases are more complex. For instance, the characterization 
in line 75 in Fig.2 is more straightforward than the one in line 82. Simpler facts 
in one phase can help characterize an adjacent phase when the algorithm analyzes 
how that property evolves along the edge. Thus utilizing the phase structure can 
improve the gradual construction of overapproximations of the sets of states reach- 
able in each phase. 

(2) Disabled transitions. A phase automaton explicitly states which transitions of the 
system are enabled in each phase, while the rest are disabled. Such impossible 
transitions induce additional safety properties to be established by the inferred 
phase characterizations. For example, the phase invariant in Fig.2 forbids a 
recv_transfer_message(k) in O[k], a fact that can trigger the inference of 
the characterization in line 75. These additional safety properties direct the search 
for characterizations that are likely to be important for the proof. 

(3) Phase-awareness. Finally, while a phase structure can be encoded in several ways 
(such as ghost code), a key aspect of our approach is that the phase decomposi- 
tion and disabled transitions are explicitly encoded in the CHC system in Sect. 5.1, 
ensuring that they guide the otherwise heuristic search. 

In Sect. 6.2 we demonstrate the effects of aspects (1)-(3) on guidance. 


6 Implementation and Evaluation 


In this section we apply invariant inference guided by phase structures to distributed 
protocols modeled in EPR, motivated by previous deductive approaches [50,51, 60]. 


6.1 Phase-PDR’ for Inferring Universally Quantified Characterizations 


We now describe our procedure for solving the CHCs system of Sect. 5.1. It either (i) 
returns universally quantified phase characterizations that induce a safe inductive phase 
invariant, (ii) returns an abstract counterexample trace demonstrating that this is not 
possible, or (iii) diverges. 


EPR. Our procedure handles transition systems expressed using the extended 
Effectively PRopositional fragment (EPR) of first order logic [51,52], and infers uni- 
versally quantified phase characterizations. Satisfiability of (extended) EPR formulas is 
decidable, enjoys the finite-model property, and supported by solvers such as Z3 [46] 
and iProver [41]. 


Phase-PDR’. Our procedure is based on PDR“ [40], a variant of PDR [10,21] that 
infers universally quantified inductive invariants. PDR computes a sequence of frames 
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Fo,.--,Fm such that F; overapproximates the set of states reachable in 2 steps. In our 
case, each frame F; is a mapping from a phase q to characterizations. The details of the 
algorithm are standard for PDR; we describe the gist of the procedure in the extended 
version [24]. We only stress the following: Counterexamples to safety take into account 
the safety property as well as disabled transitions. Search for predecessors is performed 
by going backwards on automaton edges, blocking counterexamples from preceding 
phases to prove an obligation in the current phase. Generalization is performed w.r.t. all 
incoming edges. As in PDR, proof obligations are constructed via diagrams [12]; in 
our setting these include the interpretation for the view quantifiers (see [24] for details). 


Edge Covering Check in EPR. In our setting, Eqs. (1), (2) and (4) fall in EPR, but not 
Eq. (3). Thus, we restrict edge labeling so that each edge is labeled with a TR of an 
action, together with an alternation-free precondition. It then suffices to check impli- 
cations between the preconditions and the entire TR (see the extended version [24]). 
Such edge labeling is sufficiently expressive for all our examples. Alternatively, sound 
but incomplete bounded quantifier instantiation [23] could be used, potentially allowing 
more complex decompositions of TR. 


Absence of Inductive Phase Characterizations. What happens when the user gets the 
automaton wrong? One case is when there does not exist an inductive phase invariant 
with universal phase characterizations over the given structure. When this occurs, our 
tool can return an abstract counterexample trace—a sequence of program transitions 
and transitions of the automaton (inspired by [40,49])—which constitutes a proof of 
that fact (see the extended version [24]). The counterexample trace can assist the user 
in debugging the automaton or the program and modifying them. For instance, missing 
edges occurred frequently when we wrote the automata of Sect.6, and we used the 
generated counterexample traces to correct them. 

Another type of failure is when an inductive phase invariant exists but the automaton 
does not direct the search well towards it. In this case the user may decide to terminate 
the analysis and articulate a different intuition via a different phase structure. In standard 
inference procedures, the only way to affect the search is by modifying the transition 
system; instead, phase structures equip the user with an ability to guide the search. 


6.2 Evaluation 


We evaluate our approach for user-guided invariant inference by comparing Phase- 
PDR’ to standard PDR”. We implemented PDR” and Phase-PDR” in MYPYVY [2], 
a new system for invariant inference inspired by Ivy [45], over Z3 [46]. We study: 


1. Can Phase-PDR” converge to a proof when PDR’ does not (in reasonable time)? 
2. Is Phase-PDR” faster than PDR’? 
3. Which aspects of Phase-PDR’ contribute to its performance benefits? 


Protocols. We applied PDR” and Phase-PDR’ to the most challenging examples admit- 
ting universally-quantified invariants, which previous works verified using deductive 
techniques. The protocols we analyzed are listed below and in Table 1. The full mod- 
els appear in [1]. The KV-R protocol analyzed is taken from one of the two realistic 
systems studied by the IronFleet paper [33] using deductive verification. 
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Phase Structures. The phase structures we used appear in [1]. In all our examples, it 
was straightforward to translate the existing high-level intuition of important and rele- 
vant distinctions between phases in the protocol into the phase structures we report. For 
example, it took us less than an hour to finalize an automaton for KV-R. We emphasize 
that phase structures do not include phase characterizations; the user need not supply 
them, nor has to understand the inference procedure. Our exposition of the phase struc- 
tures below refers to an intuitive meaning of each phase, but this is not part of the phase 
structure provided to the tool. 


Table 1. Running times in seconds of PDR” and Phase-PDR’, presented as the mean and standard 
deviation (in parentheses) over 16 different Z3 random seeds. “*” indicates that some runs did not 
converge after 1 h and were not included in the summary statistics. “> 1 h” means that no runs of 
the algorithm converged in 1 h. #p refers to the number of phases and #v to the number of view 
quantifiers in the phase structure. #r refers to the number of relations and lal to the maximal arity. 
The remaining columns describe the inductive invariant/phase invariant obtained in inference. Ifl 
is the maximal frame reached. #c, #q are the mean number of clauses and quantifiers (excluding 
view quantifiers) per phase, ranging across the different phases. 


Program PDRY Phase-PDR” || #p | #v || #r | lal || Inductive Phase-inductive 

Ifl #c | #q Ifl #c #q 
Lock service 2.21 (00.03) 0.67 (0.01) 4/1 ]) 5] 1 11 9 | 15 6 3-4 3-4 
(single lock) 
Lock service 2.73 (00.02) 1.06 (0.01) 4/1 ])5]2 11 9 | 24 6 4 3-4 
(multiple locks) 
Consensus 60.54 (2.95) 1355 (570)* || 3 | 1 || 7 | 2 9 6 | 15 12 5-6 | 10-14 
KV (basic) 1.79 (0.02) 1.59 (0.02) 2) 14}, 3 43 5 7 | 27 5 4 9-10 
Ring leader 152.44 (39.41) | 2.53 (0.04) 2) 2}, 4)3 6-7 | 6 | Il 5 1-2 0-1 
KV-R 2070 (370)* 372.5 (35.9) || 2 | 1 || 7 | 5 || 12-15 | 24 | 156 | 11-13 | 5-11 | 15-67 
Cache >lh 90.1 (0.82) 10; 1 || 11) 2 n/a | n/a| n/a} 13 | 10-15 | 12-27 
coherence 


(1) Achieving Convergence Through Phases. In this section we consider the effect of 
phases on inference for examples on which standard PDRY does not converge in 1 hr. 
Examples. Sharded key-value store with retransmissions (KV-R): see Sect. 3 and Exam- 
ple 1. This protocol has not been modeled in decidable logic before. 


Cache Coherence. This example implements the classic MESI protocol for maintaining 
cache coherence in a shared-memory multiprocessor [36], modeled in decidable logic 
for the first time. Cores perform reads and writes to memory, and caches snoop on each 
other’s requests using a shared bus and maintain the invariant that there is at most one 
writer of a particular cache line. For simplicity, we consider only a single cache line, and 
yet the example is still challenging for PDR”. Standard explanations of this protocol in 
the literature already use automata to describe this invariant, and we directly exploit 
this structure in our phase automaton. Phase Structure: There are 10 phases in total, 
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grouped into three parts corresponding to the modified, exclusive, and shared states 
in the classical description. Within each group, there are additional phases for when a 
request is being processed by the bus. For example, in the shared group, there are phases 
for handling reads by cores without a copy of the cache line, writes by such cores, and 
also writes by cores that do have a copy. Overall, the phase structure is directly derived 
from textbook descriptions, taking into account that use of the shared bus is not atomic. 
Results and Discussion. Measurements for these examples appear in Table 1. Standard 
PDR’ fails to converge in less than an hour on 13 out of 16 seeds for KV-R and all 16 
seeds for the cache. In contrast, Phase-PDRY converges to a proof in a few minutes in all 
cases. These results demonstrate that phase structures can effectively guide the search 
and obtain an invariant quickly where standard inductive invariant inference does not. 


(2) Enhancing Performance Through Phases. In this section we consider the use of 
phase structures to improve the speed of convergence to a proof. 

Examples. Distributed lock service, adapted from [61], allows clients to acquire and 
release locks by sending requests to a central server, which guarantees that only one 
client holds each lock at a time. Phase structure: for each lock, the phases follow the 
4 steps by which a client completes a cycle of acquire and release. We also consider a 
simpler variant with only a single lock, reducing the arity of all relations and removing 
the need for an automaton view. Its phase structure is the same, only for a single lock. 


Simple quorum-based consensus, based on the example in [60]. In this protocol, nodes 
propose themselves and then receive votes from other nodes. When a quorum of votes 
for a node is obtained, it becomes the leader and decides on a value. Safety requires that 
decided values are unique. The phase structure distinguishes between the phases before 
any node is elected leader, once a node is elected, and when values are decided. Note 
that the automaton structure is unquantified. 


Leader election in a ring [13,51], in which nodes are organized in a directional ring 
topology with unique IDs, and the safety property is that an elected leader is a node 
with the highest ID. Phase structure: for a view of two nodes n1, n2, in the first phase, 
messages with the ID of n; are yet to advance in the ring past n2, while in the second 
phase, a message advertising nı has advanced past ng. The inferred characterizations 
include another quantifier on nodes, constraining interference (see Sect. 7). 


Sharded key-value store (KV) is a simplified version of KV-R above, without mes- 
sage drops and the retransmission mechanism. The phase structure is exactly as in 
KV-R, omitting transitions related to sequence numbers and acknowledgment. This pro- 
tocol has not been modeled in decidable logic before. 


Results and Discussion. We compare the performance of standard PDR and Phase- 
PDR’ on the above examples, with results shown in Table 1. For each example, we ran 
the two algorithms on 16 different Z3 random seeds. Measurements were performed 
on a 3.4GHz AMD Ryzen Threadripper 1950X with 16 physical cores, running Linux 
4.15.0, using Z3 version 4.7.1. By disabling hyperthreading and frequency scaling and 
pinning tasks to dedicated cores, variability across runs of a single seed was negligible. 

In all but one example, Phase-PDR” improves performance, sometimes drastically; 
for example, performance for leader election in a ring is improved by a factor of 60. 
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Phase-PDR” also improves the robustness of inference [27] on this example, as the 
standard deviation falls from 39 in PDR” to 0.04 in Phase-PDR’. 

The only example in which a phase structure actually diminishes inference effec- 
tiveness is simple consensus. We attribute this to an automaton structure that does not 
capture the essence of the correctness argument very well, overlooking votes and quo- 
rums. This demonstrates that a phase structure might guide the search towards coun- 
terproductive directions if the user guidance is “misleading”. This suggests that better 
resiliency of interactive inference framework could be achieved by combining phase- 
based inference with standard inductive invariant-based reasoning. We are not aware of 
a single “good” automaton for this example. The correctness argument of this example 
is better captured by the conjunction of two automata (one for votes and one for accu- 
mulating a quorum) with different views, but the problem of inferring phase invariants 
for mutually-dependent automata is a subject for future work. 


(3) Anatomy of the Benefit of Phases. We now demonstrate that each of the beneficial 
aspects of phases discussed in Sect. 5.2 is important for the benefits reported above. 


Phase Decomposition. Is there a benefit from a phase structure even without disabled 
transitions? An example to a positive answer to this question is leader election in a ring, 
which demonstrates a huge performance benefit even without disabled transitions. 


Disabled Transitions. Is there a substantial gain from exploiting disabled transitions? 
We compare Phase-PDR” on the structure with disabled transitions and a structure 
obtained by (artificially) adding self loops labeled with the originally impossible tran- 
sitions, on the example of lock service with multiple locks (Sect. 6.2), seeing that it 
demonstrates a performance benefit using Phase-PDR” and showcases several disabled 
transitions in each phase. The result is that without disabled transitions, the mean run- 
ning time of Phase-PDR” on this example jumps from 2.73 s to 6.24 s. This demon- 
strates the utility of the additional safety properties encompassed in disabled transitions. 


Phase-Awareness. Is it important to treat phases explicitly in the inference algorithm, 
as we do in Phase-PDR” (Sect. 6.1)? We compare our result on convergence of KV- 
R with an alternative in which standard PDR” is applied to an encoding of the phase 
decomposition and disabled transition by ghost state: each phase is modeled by a rela- 
tion over possible view assignments, and the model is augmented with update code 
mimicking phase changes; the additional safety properties derived from disabled transi- 
tions are provided; and the view and the appropriate modification of the safety property 
are introduced. This translation expresses all information present in the phase structure, 
but does not explicitly guide the inference algorithm to use this information. The result 
is that with this ghost-based modeling the phase-oblivious PDR” does not converge in 
1 h on KV-R in any of the 16 runs, whereas it converges when Phase-PDR’ explicitly 
directs the search using the phase structure. 


7 Related Work 


Phases in Distributed Protocols. Distributed protocols are frequently described in 
informal descriptions as transitioning between different phases. Recently, PSync [19] 
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used the Heard-Of model [14], which describes protocols as operating in rounds, as 
a basis for the implementation and verification of fault-tolerant distributed protocols. 
Typestates [e.g.] [25,59] also bear some similarity to the temporal aspect of phases. 
State machine refinement [3,28] is used extensively in the design and verification of 
distributed systems (see e.g. [33,47]). The automaton structure of a phase invariant is 
also a form of state machine; our focus is on inference of characterizations establishing 
this. 


Interaction in Verification. Interactive proof assistants such as Coq [8] and 
Isabelle/HOL [48] interact with users to aid them as they attempt to prove candidate 
inductive invariants. This differs from interaction through phase structures and coun- 
terexample traces. Ivy uses interaction for invariant inference by interactive generaliza- 
tion from counterexamples [51]. This approach is less automatic as it requires interac- 
tion for every clause of the inductive invariant. In terminology from synthesis [30], the 
use of counterexamples is synthesizer-driven interaction with the tool, while interaction 
via phase structures is mainly user-driven. Abstract counterexample traces returned by 
the tool augment this kind of interaction. As [38] has shown, interactive invariant infer- 
ence, when considered as a synthesis problem (see also [27,55]) is related to inductive 
learning. 


Template-Based Invariant Inference. Many works employ syntactical templates for 
invariants, used to constrain the search [e.g.] [7, 16,54,57,58]. The different phases in a 
phase structure induce a disjunctive form, but crucially each disjunct also has a distinct 
semantic meaning, which inference overapproximates, as explained in Sect. 5.2. 


Automata in Safety Verification. Safety verification through an automaton-like refine- 
ment of the program’s control has been studied in a number of works. We focus on 
related techniques for proof automation. The Automizer approach to the verification 
of sequential programs [34,35] is founded on the notion of a Floyd-Hoare automa- 
ton, which is an unquantified inductive phase automaton; an extension to parallel pro- 
grams [22] uses thread identifiers closed under the symmetry rule, which are related 
to view quantifiers. Their focus is on the automatic, incremental construction of such 
automata as a union of simpler automata, where each automaton is obtained from 
generalizing the proof/infeasibility of a single trace. In our approach the structure of 
the automaton is provided by the user as a means of conveying their intuition of the 
proof, while the annotations are computed automatically. A notable difference is that 
in Automizer, the generation of characterizations in an automaton constructed from a 
single trace does not utilize the phase structure (beyond that of the trace), whereas in 
our approach the phase structure is central in generalization from states to character- 
izations. In trace partitioning [44,53], abstract domains based on transition systems 
partitioning the program’s control are introduced. The observation is that recording his- 
torical information forms a basis for case-splitting, as an alternative to fully-disjunctive 
abstractions. This differs from our motivation of distinguishing between different pro- 
tocol phases. The phase structure of the domain is determined by the analyser, and can 
also be dynamic. In our work the phase structure is provided by the user as guidance. We 
use a variant of PDR’, rather than abstract interpretation [17], to compute universally 
quantified phase characterizations. Techniques such as predicate abstraction [26,29] 
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and existential abstraction [15], as well as the safety part of predicate diagrams [11], 
use finite languages for the set of possible characterizations and lack the notion of views, 
both essential for handling unbounded numbers of processes and resources. Finally, 
Phase splitter predicates [56] share our motivation of simplifying invariant inference 
by exposing the different phases the loop undergoes. Splitter predicates correspond to 
inductive phase characterizations [56, Theorem 1], and are automatically constructed 
according to program conditionals. In our approach, decomposition is performed by 
the user using potentially non-inductive conditions, and the inductive phase charac- 
terizations are computed by invariant inference. Successive loop splitting results in a 
sequence of phases, whereas our approach utilizes arbitrary automaton structures. Bor- 
ralleras et al. [9] also refine the control-flow graph throughout the analysis by splitting 
on conditions, which are discovered as preconditions for termination (the motivation 
is to expose termination proof goals to be established): in a sense, the phase structure 
is grown from candidate characterizations implying termination. This differs from our 
approach in which the phase structure is used to guide the inference of characterizations. 


Quantified Invariant Inference. We focus here on the works on quantifiers in auto- 
matic verification most closely related to our work. In predicate abstraction, quanti- 
fiers can be used internally as part of the definitions of predicates, and also externally 
through predicates with free variables [26,42]. Our work uses quantifiers both internally 
in phases characterizations and externally in view quantifiers. The view is also related 
to the bounded number of quantifiers used in view abstraction [5,6]. In this work we 
observe that it is useful to consider views of entities beyond processes or threads, such 
as a single key in the store. Quantifiers are often used to their full extent in verification 
conditions, namely checking implication between two quantified formulas, but they are 
sometimes employed in weaker checks as part of thread-modular proofs [4,39]. This 
amounts to searching for invariants provable using specific instantiations of the quan- 
tifiers in the verification conditions [31,37]. In our verification conditions, the view 
quantifiers are localized, in effect performing a single instantiation. This is essential for 
exploiting the disjunctive structure under the quantifiers, allowing inference to consider 
a single automaton edge in each step, and reflecting an intuition of correctness. When 
necessary to constrain interference, quantifiers in phase characterizations can be used 
to establish necessary facts about interfering views. Finally, there exist algorithms other 
than PDR’ for solving CHC by predicates with universal invariants [e.g. 20,32]. 


8 Conclusion 


Invariant inference techniques aiming to verify intricate distributed protocols must 
adjust to the diverse correctness arguments on which protocols are based. In this paper 
we have proposed to use phase structures as means of conveying users’ intuition of 
the proof, to be used by an automatic inference tool as a basis for a full formal proof. 
We found that inference guided by a phase structure can infer proofs for distributed 
protocols that are beyond reach for state of the art inductive invariant inference meth- 
ods, and can also improve the speed of convergence. The phase decomposition induced 
by the automaton, the use of disabled transitions, and the explicit treatment of phases 
in inference, all combine to direct the search for the invariant. We are encouraged by 
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our experience of specifying phase structures for different protocols. It would be inter- 
esting to integrate the interaction via phase structures with other verification methods 
and proof logics, as well as interaction schemes based on different, complementary, 
concepts. Another important direction for future work is inference beyond universal 
invariants, required for example for the proof of Paxos [50]. 
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Abstract. We consider the problem whether termination of affine inte- 
ger loops is decidable. Since Tiwari conjectured decidability in 2004 [15], 
only special cases have been solved [3,4,14]. We complement this work 
by proving decidability for the case that the update matrix is triangular. 


1 Introduction 


We consider affine integer loops of the form 


while y doZ — AT +a. (1) 
Here, A € Z?*4 for some dimension d > 1, Z is a column vector of pairwise 
different variables 2,...,2g, T € Zt, and ¢ is a conjunction of inequalities of 
the form a > 0 where a € Af{[Z] is an affine expression with rational coefficients? 
over 7 (ie., Af[z] = {27 T + c |7 € Q4,c € Q}). So y has the form BT +b > 0 
where 0 is the vector containing k zeros, B € Q¥*4, and b € Q* for some k € N. 
Definition 1 formalizes the intuitive notion of termination for such loops. 
Definition 1 (Termination). Let f : Z4 — Z4 with f(z) = AT +a. If 


Je € Z’. Yn EN. y[z/f"(O)], 


then (1) is non-terminating and © is a witness for non-termination. Otherwise, 
(1) terminates. 


Here, f” denotes the n-fold application of f, i.e., we have f°(@) = @ and 
F+) = f(f”(€)). We call f the update of (1). Moreover, for any entity s, s[x/t] 
denotes the entity that results from s by replacing all occurrences of x by t. Sim- 
Ly ty 

ilarly, ife= | : | andf=| : |, then s[T/t] denotes the entity resulting from s 
Im tm 

by replacing all occurrences of x; by t; for each 1 <i <m. 


1 Note that multiplying with the least common multiple of all denominators yields 
an equivalent constraint with integer coefficients, i.e., allowing rational instead of 
integer coefficients does not extend the considered class of loops. 
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Example 2. Consider the loop 


2 
x+1 
—w—2-y 
x 


$ 


while y + z > 0 do 


x eRe 


where the update of all variables is executed simultaneously. This program 
belongs to our class of affine loops, because it can be written equivalently as 
follows. 


while y + z > 0 do 


(ame E eee A aee E ee 


xe Rg 


oO oOo — Ww 


While termination of affine loops is known to be decidable if the variables 
range over the real [15] or the rational numbers [4], the integer case is a well- 
known open problem [2—4,14,15].2 However, certain special cases have been 
solved: Braverman [4] showed that termination of linear loops is decidable (i.e., 
loops of the form (1) where @ is 0 and ¢ is of the form Bz > 0). Bozga et al. [3] 
showed decidability for the case that the update matrix A in (1) has the finite 
monoid property, i.e., if there is an n > 0 such that A” is diagonalizable and all 
eigenvalues of A” are in {0,1}. Ouaknine et al. [14] proved decidability for the 
case d < 4 and for the case that A is diagonalizable. 

Ben-Amram et al. [2] showed undecidability of termination for certain exten- 
sions of affine integer loops, e.g., for loops where the body is of the form 
if xz > 0 then = — AZ else & — A'T where A, A’ € Z**4 and z €T. 

In this paper, we present another substantial step towards the solution of the 
open problem whether termination of affine integer loops is decidable. We show 
that termination is decidable for triangular loops (1) where A is a triangular 
matrix (i.e., all entries of A below or above the main diagonal are zero). Clearly, 
the order of the variables is irrelevant, i.e., our results also cover the case that A 
can be transformed into a triangular matrix by reordering A, 7, and @ accord- 
ingly. So essentially, triangularity means that the program variables z1,..., £a 
can be ordered such that in each loop iteration, the new value of x; only depends on 
the previous values of 71,...,2;-1, £i. Hence, this excludes programs with “cyclic 
dependencies” of variables (e.g., where the new values of x and y both depend 
on the old values of both x and y). While triangular loops are a very restricted 
subclass of general integer programs, integer programs often contain such loops. 
Hence, tools for termination analysis of such programs (e.g., [5-8,11—13]) could 


? The proofs for real or rational numbers do not carry over to the integers since [15] 
uses Brouwer’s Fixed Point Theorem which is not applicable if the variables range 
over Z and [4] relies on the density of Q in R. 

3 Similarly, one could of course also use other termination-preserving pre-processings 
and try to transform a given program into a triangular loop. 
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benefit from integrating our decision procedure and applying it whenever a sub- 
program is an affine triangular loop. 

Note that triangularity and diagonalizability of matrices do not imply each 
other. As we consider loops with arbitrary dimension, this means that the class 
of loops considered in this paper is not covered by [3,14]. Since we consider affine 
instead of linear loops, it is also orthogonal to [4]. 

To see the difference between our and previous results, note that a triangular 
matrix A where c,,...,cz are the distinct entries on the diagonal is diagonaliz- 
able iff (A — c17)... (A— cT) is the zero matrix.* Here, I is the identity matrix. 
So an easy example for a triangular loop where the update matrix is not diago- 
nalizable is the following well-known program (see, e.g., [2]): 


while x >Odox+2xr+y;y-y-l 


It terminates as y eventually becomes negative and then x decreases in each 
iteration. In matrix notation, the loop body is H — 0 il H + E! i.e., 
the update matrix is triangular. Thus, this program is in our class of programs 
where we show that termination is decidable. However, the only entry on the 
diagonal of the update matrix A is c = 1 and A — cI = b l is not the zero 
matrix. So A (and in fact each A” where n € N) is not diagonalizable. Hence, 
extensions of this example to a dimension greater than 4 where the loop is still 
triangular are not covered by any of the previous results." 

Our proof that termination is decidable for triangular loops proceeds in three 
steps. We first prove that termination of triangular loops is decidable iff termina- 
tion of non-negative triangular loops (nnt-loops) is decidable, cf. Sect. 2. A loop 
is non-negative if the diagonal of A does not contain negative entries. Second, we 
show how to compute closed forms for nnt-loops, i.e., vectors q of d expressions 
over the variables % and n such that g[n/c] = f°(#) for all c > 0, see Sect. 3. 
Here, triangularity of the matrix A allows us to treat the variables step by step. 
So for any 1 < i < d, we already know the closed forms for 71,...,2;_1 when 
computing the closed form for x;. The idea of computing closed forms for the 
repeated updates of loops was inspired by our previous work on inferring lower 
bounds on the runtime of integer programs [10]. But in contrast to [10], here 
the computation of the closed form always succeeds due to the restricted shape 
of the programs. Finally, we explain how to decide termination of nnt-loops by 
reasoning about their closed forms in Sect. 4. While our technique does not yield 
witnesses for non-termination, we show that it yields witnesses for eventual non- 
termination, i.e., vectors € such that f”(¢) witnesses non-termination for some 
n € N. Detailed proofs for all lemmas and theorems can be found in [9]. 


t The reason is that in this case, (x — c1)...(a — cx) is the minimal polynomial of 
A and diagonalizability is equivalent to the fact that the minimal polynomial is a 
product of distinct linear factors. 

5 For instance, consider while x > 0 do z — x +y + z1 + z2 + 23; y= y- 1. 
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2 From Triangular to Non-Negative Triangular Loops 


To transform triangular loops into nnt-loops, we define how to chain loops. 
Intuitively, chaining yields a new loop where a single iteration is equivalent to 
two iterations of the original loop. Then we show that chaining a triangular loop 
always yields an nnt-loop and that chaining is equivalent w.r.t. termination. 


Definition 3 (Chaining). Chaining the loop (1) yields: 


while y A y|z/AT +a] do z A?Z4+ AG+4+a (2) 
Example 4. Chaining Example 2 yields 


while y +z >0^A—w-— 2. y+ x > 0 do 
w 0o 0 0 0ļ]°fw d 0 0 0]f2 2 
T 0100 x 0 1 0 0 1 1 
og) (oa o-2 ol lel Pla 0-2 0) lol lo 
z 0 1 0 0 z 0 1 0 Of; |0 0 
which simplifies to the following nnt-loop: 
w 0000 w 2 
‘ x 0100 x 2 
while y+ z>0A-—w-—2-y+a>0do i = |3040 ' + _9 
z 0100 z 1 


Lemma 5 is needed to prove that (2) is an nnt-loop if (1) is triangular. 


Lemma 5 (Squares of Triangular Matrices). For every triangular matrix 
A, A? is a triangular matrix whose diagonal entries are non-negative. 


Corollary 6 (Chaining Loops). If (1) is triangular, then (2) is an nnt-loop. 


Proof. Immediate consequence of Definition 3 and Lemma 5. 
Lemma 7 (Equivalence of Chaining). (1) terminates = > (2) terminates. 
Proof. By Definition 1, (1) does not terminate iff 


ee Zt. Yn EN. yfz/f"(O] 
ee Zt. Yn EN. yfz/f?"(2)] A vfz/f?"*" ©) 
ee Zt. Yn EN. yf[e/f2"(0] A v[z/A f?” (© +a] (by Definition of f), 


a Mt 


i.e., iff (2) does not terminate as f?(%) = A? T + AT +3 is the update of (2). 


Theorem 8 (Reducing Termination to nnt-Loops). Termination of tri- 
angular loops is decidable iff termination of nnt-loops is decidable. 


Proof. Immediate consequence of Corollary 6 and Lemma 7. 


Thus, from now on we restrict our attention to nnt-loops. 
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3 Computing Closed Forms 


The next step towards our decidability proof is to show that f"(Z) is equivalent 
to a vector of poly-exponential expressions for each nnt-loop, i.e., the closed form 
of each nnt-loop can be represented by such expressions. Here, equivalence means 
that two expressions evaluate to the same result for all variable assignments. 
Poly-exponential expressions are sums of arithmetic terms where it is always 
clear which addend determines the asymptotic growth of the whole expression 
when increasing a designated variable n. This is crucial for our decidability 
proof in Sect. 4. Let Ns; = {b € N | b > 1} (and Qyo, Ny, etc. are defined 
analogously). Moreover, Af[%] is again the set of all affine expressions over 7. 


Definition 9 (Poly-Exponential Expressions). Let C be the set of all finite 
conjunctions over the literals n = c,n # c where n is a designated variable and 
c € N. Moreover for each formula w over n, let [y] be the characteristic function 
of w, i.e., [Y] (c) = 1 tf [n/c] is valid and [Y] (c) = 0, otherwise. The set of all 
poly-exponential expressions over T is 


g 
PE[z] = 4 X` [y] aj n - BF | Laj EN, yj EC, aj € Af[T], bj € N>ı 
j=1 


As n ranges over N, we use [n > c] as syntactic sugar for [A-on # il. So 
an example for a poly-exponential expression is 


[n > 2- (2-£+3-y-—1)-n’-3” + [n= 2] (z-y). 


Moreover, note that if y contains a positive literal (i.e., a literal of the form 
“n = c” for some number c € N), then [y] is equivalent to either 0 or [n = c]. 

The crux of the proof that poly-exponential expressions can represent closed 
forms is to show that certain sums over products of exponential and poly-ex- 
ponential expressions can be represented by poly-exponential expressions, cf. 
Lemma 12. To construct these expressions, we use a variant of [1, Lemma 3.5]. 
As usual, Q[Z] is the set of all polynomials over % with rational coefficients. 


Lemma 10 (Expressing Polynomials by Differences [1]). [fq € Q[n] and 
c E€ Q, then there is an r € Q[n] such that q=r—c-rl{n/n—1] for alln € N. 


So Lemma 10 expresses a polynomial q via the difference of another polyno- 
mial r at the positions n and n — 1, where the additional factor c can be chosen 
freely. The proof of Lemma 10 is by induction on the degree of q and its structure 
resembles the structure of the following algorithm to compute r. Using the Bino- 
mial Theorem, one can verify that q — s + c- s[|n/n — 1] has a smaller degree than 
q, which is crucial for the proof of Lemma 10 and termination of Algorithm 1. 
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Algorithm 1. compute_r 
Input: q = ci: nt € Qfn], cE Q 
Result: r € Q[n] such that q = r — c: r[n/n — 1] 
if d = 0 then 
if c = 1 then return co:n else return 7% 


else 


d 
4 Pea Ca'n Ca'n 
if c = 1 then s — nT | else s — ar 


return s + compute-r(q — s +c- s|n/n— 1], c) 


Example 11. As an example, consider q = 1 (i.e., co = 1) and c = 4. Then 


we search for an r such that q = r — c- r[n/n — 1], ie, 1 = r — 4- rjn/n — 1]. 
According to Algorithm 1, the solution is r = 72%, = -4. 


Lemma 12 (Closure of PE under Sums of Products and Exponentials). 
If m €N and p € PE{z], then one can compute a q € PE[Z] which is equivalent 


to Y'a m". pin/i— I]. 


Proof. Let p = Sa [y] aj: n“ - b?. We have: 


S m”. pin/i— 1] = 


i=1 


iM 


LEWC Vm ay GY) 


As PE[Z] is closed under addition, it suffices to show that we can compute an 
equivalent poly-exponential expression for any expression of the form 


Dia WIGHT m -a Ga Tee, (4) 
We first regard the case m = 0. Here, the expression (4) can be simplified to 
[In A 0] - [2 [n/n — 1]] -a (n -— 1)° bmt. (5) 


Clearly, there is a Y’ € C such that [y] is equivalent to [n # 0] - [y[n/n — 1]]. 
Moreover, a-b"~! = %.b” where $ € Af[Z]. Hence, due to the Binomial Theorem 


[n AO] [Wln/n = jo (n=)! = Eg WT g (J) (Dien 6) 


which is a poly-exponential expression as  - (7) - (—1)' € Af[z]. 


From now on, let m > 1. If y contains a positive literal n = c, then we get 


Xa WIG 1) a (i 1) 2a l 
= ei In >i- 1] - [y] G1) m sai) (t) 
= [n > el- [y] (6) mT a- c° -bs (t) 
0, if [v] (0) = (7) 
[n >c]- =: -c-b m”, if [y] (Oo = 
€ PE[Z] (since “a -a &@ -b° € Af[T]). 
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The step marked with (f) holds as we have [n > i — 1] = 1 for alli € {1,...,n} 
and the step marked with (ft) holds since i 4 c+ 1 implies [7] (i — 1) = 0. If 
yw does not contain a positive literal, then let c be the maximal constant that 
occurs in Ņ% or —1 if ~ is empty. We get: 


din [4] @ —1)-m™*-@- (@-1)2- 
= i= pleats Yl @— lem los eas (7) (8) 
= SF fn >i- 1] W] G- 1) m -a (i1) b 

+) c+2 m” ti a- (i = 1)” b= g 


Again, the step marked with (f) holds since we have [n >i— 1] = 1 for all 
i € {1,...,n}. The last step holds as i > c+ 2 implies [7] (i — 1) = 1. Similar to 
the case where w contains a positive literal, we can compute a poly-exponential 
expression which is equivalent to the first addend. We have 


ett [n> i-1]- K= =i. a. (i— 1)° b=! 
~ S 1 [In >i-1] m? Aa 1)*- b'* m” (9) 
iiie 


-(i— 1)" -b1 € Af[z]. For the 


second addend, we have: 


eam a- (i 1) bT 
po 1)? (2) 
em” Ve eraltln/i] — F rini) Gh i 

mw ara = gk rin/i— 1]: (%)') (10) 
m” (Sega rll: (EV -Ea rnd (4)') 

em”. [maeti r-(2)" mee 1] (È ee 
>c+1]-S-r-b"—[n>cH1]-r[n/e+1]-(2 je. 2. m” 


y ( Lemma 10 with c= ¥) 


Sole SIR SR SR oe M 


Lemma 10 ensures r € Q[n], i.e., we have r = ue mi nË for some d, € N and 
mi € Q. Thus, r[n/c+1 1) (Ł) -2 Af[Z] which implies [n > c+ 1]-r[n/c+1]- 
(e -¢-m” € PE[z]. It remains to show that the addend [n > c+ 1]-¢-r-b” 


is equivalent to a poly-exponential expression. As Ẹ : m; € Af[z], we have 


[n> c+] 2r o = Ein > ctl]: gmi nib © PE). (11) 


The proof of Lemma 12 gives rise to a corresponding algorithm. 
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Algorithm 2. symbolic_ssum 

Input: m € N, p € PEJT] 

Result: q € PE[z] which is equivalent to >", m"~* - p[n/i — 1] 

rearrange )-j_,m"~*- p[n/i — 1] to aa pj as in (3) 

foreach p; € {pi,...,pe} do 
if m = 0 then compute q; as in (5) and (6) 
else if pj =[...A\n=cA...]-... then compute q; as in (7) 
else 


split pj into two sums p; ı and p;,2 as in (8) 

compute q;,1 from p;,ı as in (9) 

compute qj,2 from p; 2 as in (10) and (11) using Algorithm 1 
qj — Girt G2 


£ 
return j- 4j 


Example 13. We compute an equivalent poly-exponential expression for 
X4. ([n = 0] -2-w + [n#0]-4 — 2) [n/i 1] (12) 


where w is a variable. (It will later on be needed to compute a closed form for 
Example 4, see Example 18.) According to Algorithm 2 and (3), we get 


Xi 4" e (In = 0-2- w + [n #0]-4 — 2) [n/i 1] 
= SS" ant. (fi-1=0]:2-w + li-1#0]-4- 2) 
= pı + po + p3 


with py = Dy, [¢-1=0] - 4"? -2-w, po = X; [i -1 40] - 4"? - 4, and 
p3 = doy, 4""" - (—2). We search for q1,q2,q3 € PE[w] that are equivalent to 
P1, P2, P3, i€., q1 + G2 + q3 is equivalent to (12). We only show how to compute 
q2(and omit the computation of q = |n 4 0] -5-w-4” and q3 = 3 — 2-4”). 
Analogously to (8), we get: 


Defi 14 ofa | 
=>, [n >i-1]-E-140]-4"-*-4 
=Sh [p>é-1]-b-140)-4-1-4 + Ranta 


The next step is to rearrange the first sum as in (9). In our example, it directly 
simplifies to 0 and hence we obtain 


Sia [n >i- 1] b= 1 40] 4 4 dea dt, 
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Finally, by applying the steps from (10) we get: 


Thad 
-44 Dr (2) 


=4-4". 72, (-3-4-(-3))-(@)’ (t) 
=4-4r- (Sry (3) GÙ - Dk 4- (4) - (4)") 
=4-4". (ot, (-3)- GY - DCH) D’) 

= 4-4". [n> 1]-((—4)-()" — 4)-4) 

= [n>1]-(-§) + [p>1]-3-4 

= q2 


The step marked with (f) holds by Lemma 10 with q = 1 and c = 4. Thus, we 
have r = —4, cf. Example 11. 


Recall that our goal is to compute closed forms for loops. As a first step, 
instead of the n-fold update function h(n, T) = f” (©) of (1) where f is the update 
of (1), we consider a recursive update function for a single variable x € T: 


g(0,%) =a and g(n,%)=m-g(n—1,%)+p|n/n—-1] foraln>0 


Here, m € N and p € PE[Z]. Using Lemma 12, it is easy to show that g can be 
represented by a poly-exponential expression. 


Lemma 14 (Closed Form for Single Variables). If « € z, m € N, and 
p € PE[Z], then one can compute a q € PE[Z] which satisfies 


q(n/0) =x and q=(m-q+p)[n/n—-1] for alln>0. 
Proof. It suffices to find a q € PE[Z] that satisfies 
q=m":- 2+5 m. pln/i— 1]. (13) 
To see why (13) is sufficient, note that (13) implies 
qin/0] = matam pni] = 2 
and for n > 0, (13) implies 


q=m”- £+; m. p[jn/i— 1] 
=m" a+ (P m- pfin/i—1]) + pln/n— 1] 
ligt ay miL . pin/i— 1)) + p[n/n — 1] 
= m- q|n/n— 1] +p|n/n—1 
=(m-q+p)[n/n— 1]. 


II 
3 
3 

) 


By Lemma 12, we can compute a q’ € PE[Z] such that 


m?-e+) m"*-p[n/i-l] = m”-r+d. 
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Moreover, 


ifm =0, then m”- x = [n = 0] - x € PE[Z] and (14) 
ifm > 0, then m” - x € PE[z]. (15) 


So both addends are equivalent to poly-exponential expressions. 


Example 15. We show how to compute the closed forms for the variables w 
and x from Example 4. We first consider the assignment w — 2, i.e., we want to 
compute a qw € PE[w, x,y,z] with qw[n/0] = w and qu = (Mw: dwtPw) [n/n—1] 
for n > 0, where my = 0 and p, = 2. According to (13) and (14), qw is 


mn -w+d a mn -py[n/i—1] = 0"-w+ So, 0” -2 = [n = 0] -w+[n £ 0]-2. 


For the assignment x — x + 2, we search for a qy such that q,[n/0] = x and 
dx = (Mz ` qx + Px) [n/n —1] for n > 0, where my = 1 and py = 2. By (13), qx is 


meat SN mn.p,[n/i— 1 = 1-2 t+ SL 1M 2=242-n. 


The restriction to triangular matrices now allows us to generalize Lemma 14 
to vectors of variables. The reason is that due to triangularity, the update of 
each program variable x; only depends on the previous values of x1,...,2;. So 
when regarding x;, we can assume that we already know the closed forms for 
%1,...,2;-1. This allows us to find closed forms for one variable after the other 
by applying Lemma 14 repeatedly. In other words, it allows us to find a vector 
q of poly-exponential expressions that satisfies 


q[n/0 =z and G=A@Gn/n—-1])+a foralln>0. 


To prove this claim, we show the more general Lemma 16. For all i1,...,7, € 
{1,...,m}, we define [z1,..., 2mliz,...,ig = [Zir -++ Zip] (and the notation Ji... i, 
for column vectors is defined analogously). Moreover, for a matrix A, A; is A’s 
ith row and Ai, iniji,- jk is the matrix with rows (Aini EN jkt’ (Ain Jase 


daimi 


41,1 41,2 41,3 
So for A = 42,1 42,2 42.3), we have Aj 2:1,3 = | 


&1,1 | 
42,1 42,3 ` 
43,1 43,2 43,3 

Lemma 16. (Closed Forms for Vectors of Variables). IfT is a vector of 
at least d > 1 pairwise different variables, A € ZaXd is triangular with A;.; > 0 
for alll <i< d, and P € PE[z|¢, then one can compute q € PEJT]? such that: 


q[n/0] = Tı, and (16) 
G= (AG+H+D) [n/n-1] foralln>0 (17) 
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Proof. Assume that A is lower triangular (the case that A is upper triangular 
works analogously). We use induction on d. For any d > 1 we have: 


Ag+P) [n/n — | 
7 -q+ p;) [n/n — 1] forall<j<d 
Fi2j.40* G0,...a + Aja qi +B;) [n/n — 1] for all 1 < j <d 
1;2,3d © T2... a + Ais * G1 + Bi) [n/n — 1] A 
P 
2 


=( 
as (A 
=q; =(A 
= ( 
= (Aj;2,..,d * T2, a + Aja G1 + B;) [n/n — 1] for al 1 <j <d 

= (Aia: “Gh +B) [n/n — 1] ^ 
= (Aj;2,.. a * T2, a + Aja G1 + B;) [n/n — 1] for al 1< j <d 


A 
e 


J; 


The last step holds as A is lower triangular. By Lemma 14, we can compute a 
qı € PE[Z] that satisfies 


qın/0] =% and qı = (A11 qı + p1) [n/n— 1] for alln>0. 


In the induction base (d = 1), there is no j with 1 < j < d. In the induction 
step (d > 1), it remains to show that we can compute J q such that 


gaai 


q;[n/0]} =z; and q; = (4;;2,...,4 * q2 


garay 


gasa 


Agi 


for all n > 0. As Aj.1-9, +p; € PE[Z] for each 2 < j < d, the claim follows from 
the induction hypothesis. 


Together, Lemmas 14 and 16 and their proofs give rise to the following algo- 
rithm to compute a solution for (16) and (17). It computes a closed form q, for 
Tı as in the proof of Lemma 14, constructs the argument p for the recursive call 
based on A, qı, and the current value of p as in the proof of Lemma 16, and 
then determines the closed form for T2... q recursively. 


yee 


Algorithm 3. closed_form 
Input: 71,4, A€ Z?*¢ where Aj.; > 0 for alll <i<d, pe PE[z|? 
Result: g € PE[z]? which satisfies (16) & (17) for the given 7, A, and p 
q — symbolic_sum(A,.1,p,) (cf. Algorithm 2) 
if Ai, =0 then q, + [|n = 0] -Tı + q else q, — Aj, Tı +q (cf. (13-15)) 
if d > 1 then Aga 


q2... a — Closed form(%2,...a, A2,...4:2,..,d,} = | + + Po.. a) 


ge TIN hyn BD A Nhe yee ey Dy lyre Gh Fo ff AL ! £2)..., 


return q Aa 
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We can now prove the main theorem of this section. 


Theorem 17 (Closed Forms for nnt-Loops). One can compute a closed 
form for every nnt-loop. In other words, if f : Zt — Z? is the update function of 
an nnt-loop with the variables %, then one can compute a q € PE[z]* such that 
a[n/c] = f(T) for allec EN. 


Proof. Consider an nnt-loop of the form (1). By Lemma 16, we can compute a 
q C PE[z|¢ that satisfies 


q[n/0]) =z and g=(AG+@) [n/n—-1] foralln>0. 
We prove f°(T) = G[n/c] by induction on c € N. If c= 0, we get 
f°(@) = f°(@) = T = Q[n/0] = g[n/d. 


If c > 0, we get: f(z) = Aft (T) +a by definition of f 
= Ag[n/c—1])+@ by the induction hypothesis 
= (Aq +a) [n/c—1] as a € Z? does not contain n 
= qİn/¢] 


So invoking Algorithm 3 on Z, A, and @ yields the closed form of an nnt-loop (1). 
Example 18. We show how to compute the closed form for Example 4. For 
y—2-w+4:y-2, 


we obtain 


dy = (4: dqy +2- qw — 2) [n/n — 1] 


=4" -y +i 4! (2 qw — 2) [n/i — 1] (by (13)) 
=y 4" +); 4": (In = 0] - 2- w+ [n #0]: 4-2) [n/i — 1] (see Example 15) 
= qo + q1 + Q2 + q3 (see Example 13) 


where qo = y - 4”. For z — x + 1, we get 


qz = (dz + 1) [n/n — 1] 


= 0"; z+ 97,0": (de + 1) [n/i — 1] (by (13)) 

= [n= 0] 24 In 4 OJ - (ae[n/n — 1] +1) 
=[n=0]-2+[nF40]-((e@+2-n) [n/n—-1] +1) (see Example 15) 
= |n =0]-2+ [nF 0]-(a@-1)4+ [n FO] -2-7n. 


So the closed form of Example 4 for the values of the variables after n iterations 
is: 


qa | xat+2-n 
| qo + q1 + 92 + as 


qz [n = 0] -z+ [n #0] - (x—1)+ [n 40] -2-n 
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4 Deciding Non-Termination of nnt-Loops 


Our proof uses the notion of eventual non-termination [4,14]. Here, the idea is 
to disregard the condition of the loop during a finite prefix of the program run. 


Definition 19 (Eventual Non-Termination). A vector € € Z? witnesses 
eventual non-termination of (1) if 


dno E N. Vn € Ns ny. plE/ f” ©). 
If there is such a witness, then (1) is eventually non-terminating. 


Clearly, (1) is non-terminating iff (1) is eventually non-terminating [14]. Now 
Theorem 17 gives rise to an alternative characterization of eventual non-termi- 
nation in terms of the closed form q instead of f” (©). 


Corollary 20 (Expressing Non-Termination with PE). Ifq is the closed 
form of (1), then @ € Z? witnesses eventual non-termination iff 


dno E N. Vn € N>no. plz/ql [z/g]. (18) 


Proof. Immediate, as q is equivalent to f” (T). 


So to prove that termination of nnt-loops is decidable, we will use Corollary 20 
to show that the existence of a witness for eventual non-termination is decidable. 
To do so, we first eliminate the factors [y] from the closed form g. Assume that 
q has at least one factor [y] where y is non-empty (otherwise, all factors [y] are 
equivalent to 1) and let c be the maximal constant that occurs in such a factor. 
Then all addends [~]-a-n*-b” where w contains a positive literal become 0 and 
all other addends become a-n*- b” if n > c. Thus, as we can assume no > cin 
(18) without loss of generality, all factors [y] can be eliminated when checking 
eventual non-termination. 


Corollary 21 Removing |y] from PEs). Let @ be the closed form of an nnt- 
loop (1). Let Gnorm result from q by removing all addends |Y] -a-n* - b” where 
w contains a positive literal and by replacing all addends |y] -a -n° - b” where 
w does not contain a positive literal by a- n° b”. Then @ € Zt is a witness for 
eventual non-termination iff 


Ano EN. Yne Nysno- 01E/Gnorm] 2/4. (19) 


By removing the factors [w]] from the closed form q of an nnt-loop, we obtain 
normalized poly-exponential expressions. 


Definition 22 (Normalized PEs). We call p € PE[T] normalized if it is in 


NPE[z] = {yj -n% -b" | Lay EN, aj € Affz], b; € Noi}, 


W.l.o.g., we always assume (bi, a;i) # (bj, a;) for alli,j € {1,...,0} with i £ j. 
We define NPE = NPE[9], i.e., we have p E NPE if a; E€ Q for alll <j <£. 
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Example 23. We continue Example 18. By omitting the factors [wv], 


qw = [n = 0] -w+ [n #0] -2 becomes 2, 
= |n = 0] -z+ [n #0] - (x —1)+[n#0]-2-n becomes z-—1+2-n, 


and qz = £ +2- n, qo = y:4"”, and q3 = 2 — 2 -4” remain unchanged. Moreover, 


=[n Zo}. 
ih (-$)+[n>1]-$-4" es $) +i. 


4” 


w- becomes $+ w+ 4” and 


Thus, dy = qo + q1 + 42 + 43 becomes 


yA? +i wA iHi AHE HA y-t w)R. 


Let o = [w/2, x/x +2- n, y/4 - (y t44. w) 2 2/a 1+2-n]. Then we 


get that Example 2 is non-terminating iff there are w,x,y,z € Z,no € N such 
that 


(ytz)o>0A(-w-2-y4+2)0>0 => 
4°. (y—-5+35-w)—-2+2-142-n>0 A 
n 1 1 2 
—2 — 2. (4. (y- 4+4- w)- 9) +2+2n>0 <> 
pi >0A^Apf >0 


holds for alln > no where 


P=4". (y—Zt gw) +2-n+a and 


py =4".(2-2-y—w)+2-n+e- 


WIN wo 


Recall that the loop condition y is a conjunction of inequalities of the form 
a > 0 where a € Af[Z]. Thus, y[%/G, orm] is a conjunction of inequalities p > 0 
where p € NPE[Z] and we need to decide if there is an instantiation of these in- 
equalities that is valid “for large enough n”. To do so, we order the coefficients a; 
of the addends q; : n° - 6? of normalized poly-exponential expressions according 
to the addend’s asymptotic growth when increasing n. Lemma 24 shows that 
a2: n®? -bz grows faster than ayn“! b? iff b2 > bı or both be = bı and ag > ay. 


Lemma 24 (Asymptotic Growth). Let b),b2 € N>ı and aj,a2 E N. If 
(b2, a2) >tex (b1,.01), then O(n™ -b7) G O(n -bh). Here, >tex is the lexicographic 
order, i.e., (bz, 42) >tex (b1, a1) iff b2 > bı or bo = bı A a2 > ay. 


Proof. By considering the cases b> > bı and by = bı separately, the claim can 
easily be deduced from the definition of O. 


Definition 25 (Ordering Coefficients). Marked coefficients are of the form 
a) where a € Af[z],b € Ny, anda € N. We define unmark(a’”) = a and 


arene) >= a) if (b2, a2) > lex (bi, a1). Let 


p= Xj 0; -n% b? € NPE[zI, 


440 F. Frohn and J. Giesl 


where a; #0 for all 1 < j < £. The marked coefficients of p are 


: {069}, if £=0 
coeffs(p) = fae) | 0<j< ih, otherwise. 


Example 26. In Example 23 we saw that the loop from Example 2 is non- 
terminating iff there are w,x,y,z € Z,no € N such that pf > 0^ ps > 0 for all 
n> no. We get: 


coeffs (p?) = {iy = 5 4 1 ; w) 26D, (x = at 


coeffs (pf) = { (2 - 2+ y — w) 9,200, (æ — RO} 
Now it is easy to see that the asymptotic growth of a normalized poly- 
exponential expression is solely determined by its >-maximal addend. 


Corollary 27 (Maximal Addend Determines Asymptotic Growth). Let 
p € NPE and let max, (coeffs(p)) = c™. Then O(p) = O(c- n? - b”). 


Proof. Clear, as c-n*- b” is the asymptotically dominating addend of p. 


Note that Corollary 27 would be incorrect for the case c = 0 if we replaced 
O(p) = O(c- n*- b”) with O(p) = O(n*- b”) as O(0) Æ O(1). Building upon 
Corollary 27, we now show that, for large n, the sign of a normalized poly- 
exponential expression is solely determined by its >-maximal coefficient. Here, 
we define sign(c) = —1 if c € Q<o U {—ov}, sign(0) = 0, and sign(c) = 1 if 
c E€ Qso U {oo}. 


Lemma 28 (Sign of NPEs). Let p E€ NPE. Then limn.o p € Q iff p € Q and 
otherwise, limno p € {00, —co}. Moreover, we have 


sign (limno p) = sign(unmark(max, (coeffs(p)))). 


Proof. If p ¢ Q, then the limit of each addend of p is in {—00, oo} by definition 
of NPE. As the asymptotically dominating addend determines lim,,,,, p and 
unmark(max, (coeffs(p))) determines the sign of the asymptotically dominating 
addend, the claim follows. 


Lemma 29 shows the connection between the limit of a normalized poly-expo- 
nential expression p and the question whether p is positive for large enough n. 
The latter corresponds to the existence of a witness for eventual non-termination 
by Corollary 21 as y[%/Gporm] is a conjunction of inequalities p > 0 where 
p € NPE[z]. 


Lemma 29 (Limits and Positivity of NPEs). Let p € NPE. Then 


dno EN: Vn E€ N-ng p > 0 = limprw p > 0. 


Termination of Triangular Integer Loops is Decidable 441 


Proof. By case analysis over limno P. 


Now we show that Corollary 21 allows us to decide eventual non-termination 
by examining the coefficients of normalized poly-exponential expressions. As 
these coefficients are in Af[Z], the required reasoning is decidable. 


Lemma 30 (Deciding Eventual Positiveness of NPEs). Validity of 


Fé € Z, no € N. Yn € Nono. AX, pile/e] > 0 (20) 


where pı,...,pp E€ NPE[Z] is decidable. 


Proof. For any p; with 1 < i < k and any © € Zt, we have p;{z/c| € NPE. Hence: 


Ino EN. Yn € Nang. Ab, pilZ/2 > 0 
<> Ni dno EN. Yn € Nono- pilé/Z > 0 
=> AE, imanoo pilz/a > 0 (by Lemma 29) 
= Naa unmark(max, (coeffs(p;[z/c]))) > 0 (by Lemma 28) 


Let p € NPE[z] with coeffs(p) = Ca iy apen where ep Pts) > ae 


for all 1 <i < j < ¢. If p[z/e] = 0 holds, then coeffs(p[z/e]) = {0%} and 
thus unmark(max, (coeffs(p[%/c]))) = 0. Otherwise, there is an 1 < j < £ 
with unmark(max; (coeffs(p[z/¢]))) = a;[Z/¢c] # 0 and we have a;[%/c| = 0 
for all 1 < i < j —1. Hence, unmark(max, (coeffs(p[%/¢]))) > 0 holds iff 
V (a; [e/a] >0A AI aile/Z = 0) holds, i.e., iff [£/Z] is a model for 


max_coeff_pos(p) = Ves (a; >O0A a ai = 0) : (21) 
Hence by the considerations above, (20) is valid iff 


Jez’. Naa max_coeff_pos(p;)[%/¢] (22) 


is valid. By multiplying each (in-)equality in (22) with the least common multiple 
of all denominators, one obtains a first-order formula over the theory of linear 
integer arithmetic. It is well known that validity of such formulas is decidable. 


Note that (22) is valid iff Na max_coeff_pos(p;) is satisfiable. So to implement 
our decision procedure, one can use integer programming or SMT solvers to 
check satisfiability of Ai max-coeff-pos(p;). Lemma 30 allows us to prove our 
main theorem. 


Theorem 31. Termination of triangular loops is decidable. 


Proof. By Theorem 8, termination of triangular loops is decidable iff termination 
of nnt-loops is decidable. For an nnt-loop (1) we obtain a norm € NPE[T]? (see 
Theorem 17 and Corollary 21) such that (1) is non-terminating iff 


de ZA, no N. Vn N>no $ Plt /änorm] [z/g], (20) 
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where y is a conjunction of inequalities of the form a > 0, a € Af[z]. Hence, 


P12/Gnormllt/2] = Aix pilt/e] > 0 
where p,,...,px € NPE|Z]. Thus, by Lemma 30, validity of (20) is decidable. 


The following algorithm summarizes our decision procedure. 


Algorithm 4. Deciding Termination of Triangular Loops 


Input: a triangular loop (1) 
Result: T if (1) terminates, L otherwise 
e apply Definition 3 to (1), i.e., 
p= p^ gplz/AT +a] 
Ac A? 
a- Aa+a 
q — closed_form(%, A,@) (cf. Algorithm 3) 
e compute norm aS in Corollary 21 
e compute ¥[/Inorm] = Nia Pi > 0 
e compute ¢ = Ai max-coeff-pos(p;) (cf. (21)) 
e if ¢ is satisfiable then return L else return T 


Example 32. In Example 26 we showed that Example 2 is non-terminating iff 


Jw, x,y,z € Z, no E N. Vn € Nono: PY > OA py > 0 


is valid. This is the case if max-coeff-pos(pı) A max-coeff-pos(p2), i.e., 
y—Ztzwr>O0Vv2>0Ay—F4+5-w=0Ve—2>0A2=0Ay-F+$-w=0 
A 
2—2y—w>0V2>0AZ-2y—w=O0Vae—F>0A2=0AF-2y—-w=0 
is satisfiable. This formula is equivalent to 6-y—2+3-+-w=0 which does not 

have any integer solutions. Hence, the loop of Example 2 terminates. 


Example 33 shows that our technique does not yield witnesses for non- 
termination, but it only proves the existence of a witness for eventual non- 
termination. While such a witness can be transformed into a witness for non- 
termination by applying the loop several times, it is unclear how often the loop 
needs to be applied. 


Example 33. Consider the following non-terminating loop: 


while z>0do |7| — |*14 (23) 
y 1 


The closed form of x is q = |n = 0|: x + [n # 0]: (x+y +n- 1). Replacing x 
with qnorm in x > 0 yields x+y +n-— 1 > 0. The maximal marked coefficient of 
r+y+n—1 is 10. So by Algorithm 4, (23) does not terminate if3x,y € Z. 1 > 0 
is valid. While 1 > 0 is a tautology, (23) terminates if x < 0 or xz < —y. 
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However, the final formula constructed by Algorithm 4 precisely describes all 
witnesses for eventual non-termination. 


Lemma 34 (Witnessing Eventual Non-Termination). Let (1) be a trian- 
gular loop, let Grorm be the normalized closed form of (2), and let 


(pA gl/AZ +0) [2/Gnorm] = Ni Pi > 0- 
Then € € Z? witnesses eventual non-termination of (1) iff [Œ/€] is a model for 


Naa max_coeff _pos(p;). 


5 Conclusion 


We presented a decision procedure for termination of affine integer loops with 
triangular update matrices. In this way, we contribute to the ongoing challenge 
of proving the 15 years old conjecture by Tiwari [15] that termination of affine 
integer loops is decidable. After linear loops [4], loops with at most 4 variables 
[14], and loops with diagonalizable update matrices [3,14], triangular loops are 
the fourth important special case where decidability could be proven. 

The key idea of our decision procedure is to compute closed forms for the 
values of the program variables after a symbolic number of iterations n. While 
these closed forms are rather complex, it turns out that reasoning about first- 
order formulas over the theory of linear integer arithmetic suffices to analyze 
their behavior for large n. This allows us to reduce (non-)termination of tri- 
angular loops to integer programming. In future work, we plan to investigate 
generalizations of our approach to other classes of integer loops. 
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Abstract. Ensuring that compiler optimizations are correct is impor- 
tant for the reliability of the entire software ecosystem, since all soft- 
ware is compiled. Alive [12] is a tool for verifying LLVM’s peephole opti- 
mizations. Since Alive was released, it has helped compiler developers 
proactively find dozens of bugs in LLVM, avoiding potentially hazardous 
miscompilations. Despite having verified many LLVM optimizations so 
far, Alive is itself not verified, which has led to at least once declaring 
an optimization correct when it was not. 

We introduce AliveInLean, a formally verified peephole optimization 
verifier for LLVM. As the name suggests, AliveInLean is a reengineered 
version of Alive developed in the Lean theorem prover [14]. Assuming 
that the proof obligations are correctly discharged by an SMT solver, 
AliveInLean gives the same level of correctness guarantees as state-of- 
the-art formal frameworks such as CompCert [11], Peek [15], and Vel- 
lvm [26], while inheriting the advantages of Alive (significantly more 
automation and easy adoption by compiler developers). 


Keywords: Compiler verification - Peephole optimization - LLVM - 
Lean - Alive 


1 Introduction 


Verifying compiler optimizations is important to ensure reliability of the soft- 
ware ecosystem. Various frameworks have been proposed to verify optimizations 
of industrial compilers. Among them, Alive [12] is a tool for verifying peephole 
optimizations of LLVM that has been successfully adopted by compiler develop- 
ers. Since it was released, Alive has helped developers find dozens of bugs. 
Figure 1 shows the structure of Alive. An optimization pattern of interest 
written in a domain-specific language is given as input. Alive parses the input, 
and encodes the behavior of the source and target programs into logic formulas in 
the theory of quantified bit-vectors and arrays. Finally, several proof obligations 
are created from the encoded behavior, and then checked by an SMT solver. 
Alive relies on the following three-fold trust base. Firstly, the semantics of 
LLVM’s intermediate representation and SMT expressions. Secondly, Alive’s ver- 
ification condition generator. Finally, the SMT solver used to discharge proof 
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%s = shl 2, %N |] ; 
%q = zext %s Logical 
%r = udiv %x, %q Formula 
T Verification 
=> Pria VCGen S 
emantics Condition 
%N2 = add %N, 1 T- 3 
%N3 = zext %N2 Logical 
%r = lshr %x, %N3 | Formula 
<Input> SMT Solve 


Fig. 1. The structure of Alive and AliveInLean 


obligations. None of these are formally verified, and thus an error in any of these 
may result in an incorrect answer. 

To address this problem, we introduce AliveInLean, a formally verified peep- 
hole optimization verifier for LLVM. AliveInLean is written in Lean [14], an 
interactive theorem proving language. Its semantics of LLVM IR (Intermedi- 
ate Representation) and SMT expressions are rigorously tested using Lean’s 
metaprogramming language [5] and system library. AlivelnLean’s verification 
condition generator is formally verified in Lean. 

Using AliveInLean requires less human effort than directly proving the opti- 
mizations on formal frameworks thanks to automation given by SMT solvers. For 
example, verifying the correctness of a peephole optimization on a formal frame- 
work requires more than a hundred lines of proofs [15]. However, the correctness 
of AliveInLean relies on the correctness of the used SMT solver. To counteract 
the dependency on SMT solvers, proof obligations can be cross-checked with 
multiple SMT solvers. Moreover, there is substantial work towards making SMT 
solvers generate proof certificates [2,3,6,7]. 

AliveInLean is a proof of concept. It currently does not support all operations 
that Alive does like, e.g., memory-related operations. However, AliveInLean sup- 
ports all integer peephole optimizations, which is already useful in practice as 
most bugs found by Alive were in integer optimizations [12]. 


2 Overview 


We give an overview of AliveInLean’s features from a user’s perspective. 


Verifying Optimizations. AlivelnLean reads optimization(s) from a file and 
checks their correctness. A user writes an optimization of interest in a DSL with 
similar syntax to that of LLVM IR: 
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Name: AddSub: 1309 

vlhs = and i4 %a, <b 
Arhs = or i4 ha, <b 
wv = add i4 %lhs, %rhs 


iv = add i4 ha, %b 


This example transformation corresponds to rewriting (fa & %b) + (ha | 
hb) to Za + Xb, given 4-bits integers 4a and %b. The last variable 4r, or root vari- 
able, is assumed to be the return value of the programs. AliveInLean encodes the 
behavior of each program and generates verification conditions (VCs). Finally, 
AliveInLean calls Z3 to discharge the VCs. 


Proving Useful Properties. AliveInLean can be used as a formal framework 
to prove lemmas using interactive theorem proving. This is helpful when a user 
wants to show a property of a program which is hard to represent as a transfor- 
mation. 

For example, one may want to prove that the divisor of udiv (unsigned 
division) is never poison! if it did not raise undefined behavior (UB). The lemma 
below states this in Lean. This lemma says that the divisor val is never poison 
if the state st’ after executing the udiv instruction (step) has no UB. 


lemma never_poison: 
forall .. (HSTEP: some st’ = step st (udiv isz name opi op2)) 
(HNOUB: not (has_ub st’)) 
(HVAL: some val = get_value st op2 (ty.int isz)), 
not (is_poison val) 


Testing Specifications. AliveInLean supports random testing of AliveInLean’s 
specifications (for which no verification is possible). For example, the step func- 
tion in the above example implements a specification of the LLVM IR, and it 
can be tested with respect to the behavior of the LLVM compiler. Another trust- 
base is the specification of SMT expressions, which defines a relation between 
expressions (with no free variable) and their corresponding concrete values. 
These tests help build confidence in the validity of VC generation. Running 
tests is helpful when a user wants to use a different version of LLVM or modify 
AliveInLean’s specifications (e.g., adding a new instruction to IR). 


3 Verifying Optimizations 


In this section we introduce the different components of AliveInLean that work 
together to verify an optimization. 


1 poison is a special value of LLVM representing a result of an erroneous computation. 
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3.1 Semantics Encoder 


Given a program and an initial state, the semantics encoder produces the final 
state of the program as a set of SMT expressions. The IR interpreter is simi- 
lar, but works over concrete values rather than symbolic ones. The semantics 
encoder and the IR interpreter share the same codebase (essentially the LLVM 
IR semantics). The code is parametric on the type of the program state. For 
example, the type of undefined behavior can be either initialized as the bool 
type of Lean or the Bool SMT expression type. Given the type, Lean can auto- 
matically resolve which operations to use to update the state using typeclass 
resolution. 


3.2 Refinement Encoder 


Given a source program, a transformed program, and an initial state, the refine- 
ment encoder emits an SMT expression that encodes the refinement check 
between the final states of the two programs. To obtain the final states, the 
semantics encoder is used. 

The refinement check proves that (1) the transformed program only triggers 
UB when the original program does (i.e., UB can only be removed), (2) the root 
variable of the transformed program is only poison when it is also poison in the 
original program, and (3) variables’ values in the final states of the two programs 
are the same when no UB is triggered and the original value is not poison. 


3.3 Parser and Z3 Backend 


The parser for Alive’s DSL is implemented using Lean’s parser monad and file 
I/O library. SMT expressions are processed with Z3 using Lean’s SMT interface. 


4 Correctness of AliveInLean 


We describe how the correctness of AlivelnLean is proved. First, we explain 
the correctness proof of the semantics encoder and the refinement encoder. We 
show that if the SMT expression encoded by refinement encoder is valid, the 
optimization is indeed correct. Next, we explain how the trust-base is tested. 


4.1 Semantics Encoding 


Given an IR interpreter run, a semantics encoder encoder is correct with respect 
to run if for any IR program and input state, the final program state generated 
by run and the symbolic state encoded by encoder are equivalent. 

To formally define its correctness, an equivalence relation between SMT 
expressions and concrete values is defined. We say that an SMT expression e 
and a Lean value v are equivalent, or e ~ v, if e has no free variables and it 
evaluates to v. The equivalence relation is inductively defined with respect to 
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the structure of an SMT expression. To deal with free variables, an environment 
n is defined, which is a set of pairs (x, v) where « is a variable and v is a concrete 
value. ne] is an expression with all free variables x replaced with v if (x,v) € 7. 

Next, we define a program state. A state s is defined as (u,7) where u is an 
undefined behavior flag and r is a register file. r is a list of (x, v) where x is a 
variable and v is a value. v is defined as (sz,i,p) where sz is its size in bits, 7 is 
an integer value, and p is a poison flag. 

There are two kinds of states: a symbolic state, and a concrete state. A 
symbolic state s, is a state whose u, i, p are SMT expressions. A concrete state 
Sc is a state whose all attributes are concrete values. We say that ss and se are 
equivalent, or Ss ~ Se, if ss has no free variable in its attributes and they are 
equivalent. 7[s5] is a symbolic state with the environment 7 applied to u, i, p. 

Now, the correctness of encoder with respect to run is defined as follows. It 
states that the result of encoder is equivalent to the result of run. 


Theorem 1. For all initial states Ss, Sce, program p, and environment 1 s.t. 
n[ Ss] ~ Sc, we have that nlencoder(p, ss)] ~ run(p, se). 


4.2 Refinement Encoding 


Function check(psre, Pigt, Ss) generates an SMT expression that encodes refine- 
ment between the source and target programs, respectively, Psre and Pigt- 

We first define refinement between two concrete states. As Alive does, 
AliveInLean only checks the value of the root variable of a program. Given a 
root variable r, a concrete state s’, refines se, or s/, E se, if (1) se has unde- 
fined behavior, or (2) both se and s’, have values assigned to r, say v and 
v’, and v = poison V v’ = v. A target program pPrgt refines program Perc if 
run(pigt, Se) E run(Psre, Sc) holds for any initial concrete state s¢,. 

The correctness of check is stated as follows. 


Theorem 2. Given an initial symbolic state Ss, if nolcheck(Dsre,Ptgts Ss)] ~ 
true for any no, then for any environment ņ and initial state se s.t. n[ 8s] ~ Sc, 
we have that run(pigt, Se) E run(psre; Se). 


This theorem says that if the returned expression of check evaluates to true 
in any environment, program p;g¢ refines program Perc. 


4.3 Validity of Trust-Base 


Testing Specification of SMT Expressions. Specifications of SMT expres- 
sions are traversed using Lean’s metaprogramming language and tested. The 
testing we have done is different from QuickChick [4] because QuickChick evalu- 
ates expressions in Coq. The approach cannot be used here because SMT expres- 
sions need to be evaluated in an SMT solver (e.g., Z3). Example spec: 


forall {sz : size} (si s2 : sbitvec sz) (b1 b2 : bitvector sz), 
bv_equiv si bi -> bv_equiv s2 b2 -> 
bv_equiv (sbitvec.add s1 s2) (bitvector.add b1 b2) 
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This spec says that if SMT expressions s1, s2 of a bit-vector type (sbitvec) 
are equivalent to two concrete bit-vector values b1, b2 in Lean (bitvector), an 
add expression of s1, s2 is equivalent to the result of adding b1 and b2. Function 
bitvector.add must be called in Lean, so its operands (b1, b2) are assigned 
random values in Lean. sbitvec.add is translated to SMT’s bvadd expression, 
and s1 and s2 are initialized as BitVec variables in an SMT solver. The testing 
function generates an SMT expression with random inputs like the following: 


(assert (forall ((s1 (_ BitVec 4))) (forall ((s2 (_ BitVec 4))) 
(=> (= si #xA) (=> (= s2 #x2) (= (bvadd si s2) #xC)))))) 


The size of bitvector (sz) is initialized to 4, and b1, b2 were randomly initial- 
ized to 10 (#xA) and 2 (#x2). A specification is incorrect if the generated SMT 
expression is not valid. 


Testing Specification of LLVM IR. Specification of LLVM IR is tested using 
randomly generated IR programs. IR programs of 5-10 randomly chosen instruc- 
tions are generated, compiled with LLVM, and ran. The result of the execution 
of the program is compared with the result of AliveInLean’s IR interpreter. 


5 Evaluation 


For the evaluation, we used a computer with an Intel Core i5-6600 CPU and 8 GB 
of RAM, and Z3 [13] for SMT solving. To test whether AliveInLean and Alive 
give the same result, we used all of the 150 integer optimizations from Alive’s 
test suite that are supported by AliveInLean. No mismatches were observed. 

To test the SMT specification, we randomly generated 10,000 tests for each 
of the operations (18 bit-vector and 15 boolean). This test took 3 CPU hours. 

The LLVM IR specification was tested by running 1,000,000 random IR pro- 
grams in our interpreter and comparing the output with that of LLVM. This 
comparison needs to take into account that some programs may trigger UB or 
yield a poison value, which gives freedom to LLVM to produce a variety of results. 
These tests took 10 CPU hours overall. Four admitted arithmetic lemmas were 
tested as well. As a side-effect of the testing, we found several miscompilation 
bugs in LLVM.? 

AliveInLean? consists of 11.9K lines of code. The optimization verifier con- 
sists of 2.2K LoC, the specification tester is 1.5K, and the proof has 8.1K lines. 
It took 3 person-months to implement the tool and prove its correctness. 


6 Related Work 


We introduce previous work on compiler verification and validation and compare 
it with AliveInLean. Also, we give an overview on previous work on semantics 
of compiler intermediate representations (IRs). 


? https: //llvm.org/PR40657. 
3 https: //github.com/Microsoft /AliveInLean. 
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6.1 Compiler Verification 


Proving Correctness on Formal Semantics. The correctness of compilation 
can be proved on a formal semantics of a language that is written in a theorem 
proving language such as Coq. Vellvm [26] is a Coq formalization of the semantics 
of LLVM IR. CompCert [11] is a verified C compiler written in Coq, and its 
compilation to assembly languages including x86, PowerPC is proved correct. 

However, it is hard to apply this approach to existing industrial compilers 
because proving correctness of optimizations requires non-trivial effort. Peek [15] 
is a framework for implementing and verifying peephole optimizations for x86 
on CompCert. They implemented 28 peephole optimizations which required 3.3k 
lines of code and 6.6k lines of proofs (~350 LoC each). Even if this is small 
compared to the size of CompCert, the burden is non-trivial considering that 
LLVM has more than 1,000 peephole optimizations [12]. 

Another problem with this approach is that changing the semantics requires 
modification of the proof. The semantics of poison and undef value of LLVM 
is currently not consistent and thus it triggers miscompilations of some pro- 
grams [10]. Therefore, compiler developers regularly test various undef seman- 
tics with existing optimizations, which would be a non-trivial task if correctness 
proofs had to be manually updated. 


Translation Validation and Credible Compilation. In translation valida- 
tion [18], a pair of an original program and an optimized program is given to 
a validation tool at compile time to check the correctness of the optimization. 
Several such tools exist for LLVM [20, 22,25]. Translation validation is, however, 
slow, and it cannot tell whether an optimization is correct in general. Consider 
this optimization: 


z=0- (x/C) 


If C is a constant, -C can be computed at compile time. However, this opti- 
mization is wrong only if C is INT_MIN. To show that compilation is fully correct, 
translation validation would need to be run for every combination of inputs. 

Credible compilation [19], or witnessing compiler [16,17], is an approach to 
improve translation validation by accepting witnesses generated by a compiler. 
Crellvm [8] is a credible compilation framework for LLVM. It requires modifica- 
tions to the compiler, which makes it harder to apply and maintain. 


6.2 Solver-Aided Programming Languages 


Proving correctness of optimizations can be represented as a search problem 
that finds a counter-example for the optimization. Tools like Z3, CVC4 can be 
used to solve the search problem. Translation of a high-level search problem to 
the external solver’s input has been considered bug-prone, and frameworks like 
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Rosette [21] and Smten [23] address this issue by providing higher-level languages 
for describing the search problem. SpaceSearch [24] helps programmers prove the 
correctness of the description by supporting Coq and Rosette backends from a 
single specification. AliveInLean provides a stronger guarantee of correctness 
because translation to SMT expressions is also written in Lean, leaving Lean as 
the sole trust-base. 


6.3 Semantics of Compiler IR 


Correctly encoding semantics of compiler IR is important for the validity of 
a tool. LLVM IR is an SSA-based intermediate representation which is used 
to represent a program being compiled. LLVM LangRef [1] has an informal 
definition of the LLVM IR, but there are a few known problems. [10] shows 
that the semantics of poison and undef values are inconsistent. [9] shows that 
the semantics of pointer integer casting is inconsistent. AliveInLean supports 
poison but not undef, following the suggestion from [10]. AliveInLean does not 
support memory-related operations including load, store, and pointer + integer 
casting. 


7 Discussion 


AliveInLean has several limitations. As discussed before, AliveInLean does not 
support memory operations. Correctly encoding the memory model of LLVM 
IR is challenging because the memory model of LLVM IR is more complex than 
either a byte array or a set of memory objects [9]. Supporting branch instruc- 
tions and floating point would help developers prove interesting optimizations. 
Supporting branches is a challenging job especially when loops are involved. 

Maintainability of AliveInLean highly relies on one’s proficiency in Lean. 
Changing the semantics of an IR instruction breaks the proof, and updating it 
requires proficiency in Lean. However, we believe that only relevant parts in the 
proof need to be updated as the proof is modularized. 

Alive has features that are absent in AliveInLean. Alive supports defining a 
precondition for an optimization, inferring types of variables if not given, and 
showing counter-examples if the optimization is wrong. We leave this as future 
work. 


8 Conclusion 


AliveInLean is a formally verified compiler optimization verifier. Its verification 
condition generator is formally verified with a machine-checked proof. Using 
AliveInLean, developers can easily check the correctness of compiler optimiza- 
tions with high reliability. Also, they can use AliveInLean as a formal framework 
like Vellvm to prove properties of interest in limited cases. The extensive random 
testing did not find problems in the trust base, increasing its trustworthiness. 
Moreover, as a side-effect of the IR semantics testing, we found several bugs in 
LLVM. 
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Abstract. Maintaining multiple replicas of data is crucial to achiev- 
ing scalability, availability and low latency in distributed applications. 
Conflict-free Replicated Data Types (CRDTs) are important building 
blocks in this domain because they are designed to operate correctly 
under the myriad behaviors possible in a weakly-consistent distributed 
setting. Because of the possibility of concurrent updates to the same 
object at different replicas, and the absence of any ordering guarantees 
on these updates, convergence is an important correctness criterion for 
CRDTs. This property asserts that two replicas which receive the same 
set of updates (in any order) must nonetheless converge to the same 
state. One way to prove that operations on a CRDT converge is to show 
that they commute since commutative actions by definition behave the 
same regardless of the order in which they execute. In this paper, we 
present a framework for automatically verifying convergence of CRDTs 
under different weak-consistency policies. Surprisingly, depending upon 
the consistency policy supported by the underlying system, we show that 
not all operations of a CRDT need to commute to achieve convergence. 
We develop a proof rule parameterized by a consistency specification 
based on the concepts of commutativity modulo consistency policy and 
non-interference to commutativity. We describe the design and imple- 
mentation of a verification engine equipped with this rule and show how 
it can be used to provide the first automated convergence proofs for a 
number of challenging CRDTs, including sets, lists, and graphs. 


1 Introduction 


For distributed applications, keeping a single copy of data at one location or 
multiple fully-synchronized copies (i.e. state-machine replication) at different 
locations, makes the application susceptible to loss of availability due to net- 
work and machine failures. On the other hand, having multiple un-synchronized 
replicas of the data results in high availability, fault tolerance and uniform low 
latency, albeit at the expense of consistency. In the latter case, an update issued 
at one replica can be asynchronously transmitted to other replicas, allowing 
the system to operate continuously even in the presence of network or node 
failures [8]. However, mechanisms must now be provided to ensure replicas are 
kept consistent with each other in the face of concurrent updates and arbitrary 
re-ordering of such updates by the underlying network. 
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Over the last few years, Conflict-free Replicated Datatypes (CRDTs) [19-21] 
have emerged as a popular solution to this problem. In op-based CRDTs, when 
an operation on a CRDT instance is issued at a replica, an effector (basically an 
update function) is generated locally, which is then asynchronously transmitted 
(and applied) at all other replicas.' Over the years, a number of CRDTs have 
been developed for common datatypes such as maps, sets, lists, graphs, etc. 

The primary correctness criterion for a CRDT implementation is conver- 
gence (sometimes called strong eventual consistency [9,20] (SEC)): two replicas 
which have received the same set of effectors must converge to the same CRDT 
state. Because of the weak default guarantees assumed to be provided by the 
underlying network, however, we must consider the possibility that effectors 
can be applied in arbitrary order on different replicas, complicating correctness 
arguments. This complexity is further exacerbated because CRDTs impose no 
limitations on how often they are invoked, and may assume additional properties 
on network behaviour [14] that must be taken into account when formulating 
correctness arguments. 

Given these complexities, verifying convergence of operations in a replicated 
setting has proven to be challenging and error-prone [9]. In response, several 
recent efforts have used mechanized proof assistants to yield formal machine- 
checked proofs of correctness [9,24]. While mechanization clearly offers stronger 
assurance guarantees than handwritten proofs, it still demands substantial man- 
ual proof engineering effort to be successful. In particular, correctness arguments 
are typically given in terms of constraints on CRDT states that must be satisfied 
by the underlying network model responsible for delivering updates performed 
by other replicas. Relating the state of a CRDT at one replica with the visibility 
properties allowed by the underlying network has typically involved construct- 
ing an intricate simulation argument or crafting a suitably precise invariant to 
establish convergence. This level of sophisticated reasoning is required for every 
CRDT and consistency model under consideration. There is a notable lack of 
techniques capable of reasoning about CRDT correctness under different weak 
consistency policies, even though such techniques exist for other correctness cri- 
teria such as preservation of state invariants [10,11] or serializability [4,16] under 
weak consistency. 

To overcome these challenges, we propose a novel automated verification 
strategy that does not require complex proof-engineering of handcrafted sim- 
ulation arguments or invariants. Instead, our methodology allows us to directly 
connect constraints on events imposed by the consistency model with con- 
straints on states required to prove convergence. Consistency model constraints 
are extracted from an axiomatization of network behavior, while state con- 
straints are generated using reasoning principles that determine the commuta- 
tivity and non-interference of sequences of effectors, subject to these consistency 
constraints. Both sets of constraints can be solved using off-the-shelf theorem 


1 In this work, we focus on the op-based CRDT model; however, our technique nat- 
urally extends to state-based CRDTs since they can be emulated by an op-based 
model [20]. 
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provers. Because an important advantage of our approach is that it is parametric 
on weak consistency schemes, we are able to analyze the problem of CRDT con- 
vergence under widely different consistency policies (e.g., eventual consistency, 
causal consistency, parallel snapshot isolation (PSI) [23], among others), and for 
the first time verify CRDT convergence under such stronger models (efficient 
implementations of which are supported by real-world data stores). A further 
pleasant by-product of our approach is a pathway to take advantage of such 
stronger models to simplify existing CRDT designs and allow composition of 
CRDTs to yield new instantiations for more complex datatypes. 
The paper makes the following contributions: 


1. We present a proof methodology for verifying the correctness of CRDTs 
amenable to automated reasoning. 

2. We allow the proof strategy to be parameterized on a weak consistency spec- 
ification that allows us to state correctness arguments for a CRDT based on 
constraints imposed by these specifications. 

3. We experimentally demonstrate the effectiveness of our proposed verification 
strategy on a number of challenging CRDT implementations across multiple 
consistency schemes. 


Collectively, these contributions yield (to the best of our knowledge) the first 
automated and parameterized proof methodology for CRDT verification. 

The remainder of the paper is organized as follows. In the next section, we 
provide further motivation and intuition for our approach. Section 3 formalizes 
the problem definition, providing an operational semantics and axiomatizations 
of well-known consistency specifications. Section 4 describes our proof strategy 
for determining CRDT convergence that is amenable to automated verification. 
Section 5 provides details about our implementation and experimental results 
justifying the effectiveness of our framework. Section6 presents related work 
and conclusions. 


2 Illustrative Example 


Se P(E) We illustrate our approach using a Set CRDT 
Add(a):S AS’.S’Ufa} specification as a running example. A CRDT 
Remove(a):S AS’.S’\{a} (7,O,cinit) is characterized by a set of states 
Lookup(a):S aes X, a set of operations O and an initial state 


Jinit E X, where each operation o € O is a func- 
Fig. 1. A simple Set CRDT def- tion with signature X > (X — X). The state of 
a CRDT is replicated, and when operation o is 
issued at a replica with state ø, the effector o(c) 
is generated, which is immediately applied at the local replica (which we also call 
the source replica) and transmitted to all other replicas, where it is subsequently 
applied upon receipt. 


inition. 
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Additional constraints on the order in which effectors can be received and 
applied at different replicas are specified by a consistency policy, discussed below. 
In the absence of any such additional constraints, however, we assume the under- 
lying network only offers eventually consistent guarantees - all replicas eventually 
receive all effectors generated by all other replicas, with no constraints on the 
order in which these effectors are received. 

Consider the simple Set CRDT definition shown in Fig.1. Let E be an 
arbitrary set of elements. The state space X is P(E). Add(a):S denotes the 
operation Add(a) applied on a replica with state S, which generates an effector 
which simply adds a to the state of all other replicas it is applied to. Similarly, 
Remove (a) :S generates an effector that removes a on all replicas to which it is 
applied. Lookup(a):S is a query operation which checks whether the queried 
element is present in the source replica S. 

A CRDT is convergent if during any execution, any two replicas which have 
received the same set of effectors have the same state. Our strategy to prove 
convergence is to show that any two effectors of the CRDT pairwise commute 
with each other modulo a consistency policy, i.e. for two effectors e; and e2, 
€19€2 = e2 0 €1. Our simple Set CRDT clearly does not converge when executed 
on an eventually consistent data store since the effectors e} = Add(a):S, and 
e2 = Remove (a) :S2 do not commute, and the semantics of eventual consistency 
imposes no additional constraints on the visibility or ordering of these operations 
that could be used to guarantee convergence. For example, if e; is applied to 
the state at some replica followed by the application of e2, the resulting state 
does not include the element a; conversely, applying eg to a state at some replica 
followed by e leads to a state that does contain the element a. 

However, while commutativity is a sufficient 
property to show convergence, it is not always 
a necessary one. In particular, different consis- \S?.S’UL(a,i)} 

tency models impose different constraints on Remove(a):S 

the visibility and ordering of effectors that can AS?.S?\{(a,i): (a, i) Est 
obviate the need to reason about their commu- Lookup (a):$ 

tativity. For example, if the consistency model J3(a,i)€A 

enforces Add(a) and Remove (a) effectors to 
be applied in the same order at all replicas, 
then the Set CRDT will converge. As we will 
demonstrate later, the PSI consistency model 
exactly matches this requirement. To further illustrate this, consider the defini- 
tion of the ORSet CRDT shown in Fig. 2. Here, every element is tagged with a 
unique identifier (coming from the set J). Add (a,i) :S simply adds the element 
a tagged with i?, while Remove (a) :S returns an effector that when applied to 
a replica state will remove all tagged versions of a that were present in S, the 
source replica. 


Se P(E x I) 
Add(a,i):S 


Fig. 2. A definition of an ORSet 
CRDT. 


? Assume that every call to Add uses a unique identifier, which can be easily arranged, 
for example by keeping a local counter at every replica which is incremented at every 
operation invocation, and using the id of the replica and the value of the counter as 
a unique identifier. 
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Suppose e; =Add(a,i):S; and eg =Remove(a) :So. If it is the case that S2 
does not contain (a,i), then these two effectors are guaranteed to commute 
because eg is unaware of (a,i) and thus behaves as a no-op with respect to 
effector e; when it is applied to any replica state. Suppose, however, that e1’s 
effect was visible to e2; in other words, e; is applied to S2 before e2 is generated. 
There are two possible scenarios that must be considered. (1) Another replica 
(call it S?) has e2 applied before e1. Its final state reflects the effect of the Add 
operation, while S’s final state reflects the effect of applying the Remove; clearly, 
convergence is violated in this case. (2) All replicas apply e; and e2 in the same 
order; the interesting case here is when the effect of eı is always applied before 
e2 on every replica. The constraint that induces an effector order between e and 
e2 on every replica as a consequence of e1’s visibility to e2 on S2 is supported 
by a causally consistent distributed storage model. Under causal consistency, 
whenever e2 is applied to a replica state, we are guaranteed that e1’s effect, 
which adds (a,i) to the state, would have occurred. Thus, even though e; and 
e2 do not commute when applied 
to an arbitrary state, their exe- 
cution under causal consistency 
nonetheless allows us to show that 
Remove (a): (A,R) all replica states converge. The 

\(A?,R?). CA’, R?UL(Ca,i):(a,i)ea} essence of our proof methodol- 
ogy is therefore to reason about 
Lookup (a): (A,R) commutativity modulo consistency 
3(a,i)cA^(a,i)¢R - it is only for those CRDT 
operations unaffected by the con- 
straints imposed by the consis- 
tency model that proving commu- 
tativity is required. Consistency 
properties that affect the visibility of effectors are instead used to guide and 
simplify our analysis. Applying this notion to pairs of effectors in arbitrarily 
long executions requires incorporating commutativity properties under a more 
general induction principle to allow us to generalize the commutativity of effec- 
tors in bounded executions to the unbounded case. This generalization forms 
the heart of our automated verification strategy. 

Figure 3 defines an ORSet with “tombstone” markers used to keep track of 
deleted elements in a separate set. Our proof methodology is sufficient to auto- 
matically show that this CRDT converges under EC. 


se P(E x 1) x P(E x TI) 
Add(a,i):(A,R) 
ACA? ,R?). CA?UL (Ca, i) FR’) 


Fig. 3. A variant of the ORSet using tomb- 
stones. 


3 Problem Definition 


In this section, we formalize the problem of determining convergence in CRDTs 
parametric to a weak consistency policy. First, we define a general operational 
semantics to describe all valid executions of a CRDT under any given weak 
consistency policy. As stated earlier, a CRDT program P is specified by the 
tuple (X, O, Gint). Here, we find it to convenient to define an operation o € O as 
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a function (X x (X > X)*) => (X > X). Instead of directly taking as input a 
generating state, operations are now defined to take as input a start state and 
a sequence of effectors. The intended semantics is that the sequence of effectors 
would be applied to the start state to obtain the generating state. Using this 
syntax allows us simplify the presentation of the proof methodology in the next 
section, since we can abstract a history of effectors into an equivalent start state. 

Formally, if 6: X — (X — X) was the original op-based definition, then we 
define the operation o : (X x (X > X)*) > (X = X) as follows: 


Vo. o(0,€) = ôl) 


Vo, n, f. olo, nf) = o(f(c), 7) 


Note that € indicates the empty sequence. Hence, for all states ø and sequence 
of functions 7, we have o(0,7) = 6((o)). 

To define the operational semantics, we abstract away from the concept of 
replicas, and instead maintain a global pool of effectors. A new CRDT opera- 
tion is executed against a CRDT state obtained by first selecting a subset of 
effectors from the global pool and then applying the elements in that set in 
some non-deterministically chosen permutation to the initial CRDT state. The 
choice of effectors and their permutation must obey the weak consistency policy 
specification. Given a CRDT P = (27,0, Gint) and a weak consistency policy 
W, we define a labeled transition system Spy = (C,—), where C is a set of 
configurations and — is the transition relation. A configuration c = (A, vis, eo) 
consists of three components: A is a set of events, vis C A x A is a visibility 
relation, and eo C A x A is a global effector order relation (constrained to be 
anti-symmetric). An event 7 € A is a tuple (eid, 0, os, A;,e0) where eid is a 
unique event id, o € O is a CRDT operation, os € X is the start CRDT state, 
Ar is the set of events visible to 7 (also called the history of 7), and eo is a 
total order on the events in A, (also called the local effector order relation). We 
assume projection functions for each component of an event (for example o,(7) 
projects the start state of the event n). 

Given an event 7 = (eid, 0, os, Ap, €o), we define 7° to be the effector associ- 
ated with the event. This effector is obtained by executing the CRDT operation 
o against the start CRDT state o, and the sequence of effectors obtained from 
the events in A, arranged in the reverse order of eo. Formally, 


olos, €) ifA,=¢ 
no = olos, [Ii nba) if Ar = {m,... Nk} where P: {1,...,k} — {1,...,k} 
Vi, j.i < j = (npg) npu) E eo 
(1) 


In the above definition, when A,. is non-empty, we define a permutation P of the 
events in A, such that the permutation order is the inverse of the effector order 
eo. This ensures that if (7:,7;) € eo, then n? occurs before nf in the sequence 
passed to the CRDT operation o, effectively applying nj; before 7; to obtain the 
generating state for o. 
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The following rule describes the transitions allowed in Spy: 


A,C A o€O os EX eo, is a total order on A; 
eo Ceo, freshid 7 = (id,o,0s, Ar, eo) 
A’ = AU {n} vis’ = visU {(n',n) | n € Ar} Y(A', vis’, eo") 


(A, vis, eo) & (A’, vis’, eo’) 


The rule describes the effect of executing a new operation o, which begins by 
first selecting a subset of already completed events (A,) and a total order eo, 
on these events which obeys the global effector order eo. This mimics applying 
the operation o on an arbitrary replica on which the events of A, have been 
applied in the order eo,. A new event (7) corresponding to the issued operation 
o is computed, which is used to label the transition and is also added to the cur- 
rent configuration. All the events in A,. are visible to the new event 7, which is 
reflected in the new visibility relation vis’. The system moves to the new configu- 
ration (A’, vis’,eo’) which must satisfy the consistency policy Y. Note that even 
though the general transition rule allows the event to pick any arbitrary start 
state gcs, we restrict the start state of all events in a well-formed execution 
to be the initial CRDT state Ginit, ice. the state in which all replicas begin their 
execution. A trace of Sp y is a sequence of transitions. Let [Spy] be the set of 
all finite traces. Given a trace 7, L(T) denotes all events (i.e. labels) in 7. 


Definition 1 (Well-formed Execution). A trace T € [Spy] is a well-formed 
execution if it begins from the empty configuration Cint = ({}, {},{}) and Vn € 
L(r), os(1) = Ginit- 


Let WF(Sp wz) denote all well-formed executions of Sp y. The consistency 
policy (A, vis, eo) is a formula constraining the events in A and relations vis 
and eo defined over these events. Below, we illustrate how to express certain 
well-known consistency policies in our framework: 


Consistency scheme W(A, vis, eo) 
Eventual Consistency [3] Yn, n’ E A.neo(n, n’) 
Causal Consistency [14] Yn, n’ € A.vis(n, 7’) = eo(n, n’) 


AYN, 7,7 € A.vis(n, 77’) A vis(n’, n”) = vis(n, n”) 

RedBlue Consistency (O+) [13] Yn, n’ € A.o(7) € Or A o(n’) € Or A vis(n, n’) & e0(n, n’) 
AYN, n’ E€ A.o(n) € Or A ofn’) E€ Or => vis(n, n’) V vis(n’, n) 
Parallel Snapshot Isolation [23] Yn, n” € A.(Wr(n®) A Wr(n ©) Æ &Avis(n, 7')) & eo(n, n’) 
AVn, n! € A.Wr(n®) N Wr(n'®) Æ $ = vis(n, n’) V vis(n’, n) 
Strong Consistency Yn, n’ E€ A.vis(n, 7’) = eo(n, n’) 

AVn, 7 € A.vis(n, 7’) V vis(n’, n) 
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For Eventual Consistency (EC) [3], we do not place any constraints on the 
visibility order and require the global effector order to be empty. This reflects 
the fact that in EC, any number of events can occur concurrently at different 
replicas, and hence a replica can witness any arbitrary subset of events which 
may be applied in any order. In Causal Consistency (CC) [14], an event is applied 
at a replica only if all causally dependent events have already been applied. An 
event 71 is causally dependent on n2 if nı was generated at a replica where either 
n2 or any other event causally dependent on 72 had already been applied. The 
visibility relation vis captures causal dependency, and by making vis transitive, 
we ensure that all causal dependencies of events in A, are also present in A, 
(this is because in the transition rule, W is checked on the updated visibility 
relation which relates events in A, with the newly generated event). Further, 
causally dependent events must be applied in the same order at all replicas, 
which we capture by asserting that vis implies eo. In RedBlue Consistency (RB) 
[13], a subset of CRDT operations (O, C O) are synchronized, so that they 
must occur in the same order at all replicas. We express RB in our framework 
by requiring the visibility relation to be total among events whose operations 
are in O,. In Parallel Snapshot Isolation (PSI) [23], two events which conflict with 
each other (because they write to a common variable) are not allowed to be 
executed concurrently, but are synchronized across all replicas to be executed 
in the same order. Similar to [10], we assume that when a CRDT is used under 
PSI, its state space X is a map from variables to values, and every operation 
generates an effector which simply writes to certain variables. We assume that 
Wr(7°) returns the set of variables written by the effector 7°, and express PSI 
in our framework by requiring that events which write a common variable are 
applied in the same order (determined by their visibility relation) across all 
replicas; furthermore, the policy requires that the visibility operation among 
such events is total. Finally, in Strong Consistency, the visibility relation is total 
and all effectors are applied in the same order at all replicas. 

Given an execution T € [Sp.y] and a transition C 4 C’ in r, we associate 
a set of replica states X, that the event can potentially witness, by consider- 
ing all permutations of the effectors visible to 7 which obey the global effector 
order, when applied to the start state o,(7). Formally, this is defined as follows, 
assuming n = (eid, 0, os, {1, .. -, Nk}, €Or) and C = (A, vis, eo)): 


Xn = {Ny © Npe) 2 -+ 2 Npe) (Gs) | P:{1,...,k} > {1,..., k}, 
eop is a total order ,i < j > (np) Np(i)) € eop, eo C eop} 


In the above definition, for all valid local effector orders eop, we compute the 
CRDT states obtained on applying those effectors on the start CRDT state, 
which constitute X„. The original event 7 presumably would have witnessed one 
of these states. 


Definition 2 (Convergent Event). Given an execution T € [Sp w] and an 
event n € L(T), is convergent if Xy is singleton. 
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Definition 3 (Strong Eventual Consistency). A CRDT (2,0, init) 
achieves strong eventual consistency (SEC)under a weak consistency specification 
W if for all well-formed executions T E€ WF(Sp y) and for all events n € L(r), 
7 is convergent. 


An event is convergent if all valid permutations of visible events according to 
the specification W lead to the same state. This corresponds to the requirement 
that if two replicas have witnessed the same set of operations, they must be in the 
same state. A CRDT achieves SEC if all events in all executions are convergent. 


4 Automated Verification 


In order to show that a CRDT achieves SEC under a consistency specification, 
we need to show that all events in any execution are convergent, which in turn 
requires us to show that any valid permutation of valid subsets of events in an 
execution leads to the same state. This is a hard problem because we have to 
reason about executions of unbounded length, involving unbounded sets of effec- 
tors and reconcile the declarative event-based specifications of weak consistency 
with states generated during execution. To make the problem tractable, we use 
a two-fold strategy. First, we show that if any pair of effectors generated during 
any execution either commute with each other or are forced to be applied in the 
same order by the consistency policy, then the CRDT achieves SEC. Second, we 
develop an inductive proof rule to show that all pairs of effectors generated dur- 
ing any (potentially unbounded) execution obey the above mentioned property. 
To ensure soundness of the proof rule, we place some reasonable assumptions on 
the consistency policy that (intuitively) requires behaviorally equivalent events 
to be treated the same by the policy, regardless of context (i.e., the length 
of the execution history at the time the event is applied). We then extract a 
simple sufficient condition which we call as non-interference to commutativity 
that captures the heart of the inductive argument. Notably, this condition can 
be automatically checked for different CRDTs under different consistency poli- 
cies using off-the-shelf theorem provers, thus providing a pathway to performing 
automated parametrized verification of CRDTs. 

Given a transition (A, vis,eo) + C, we denote the global effector order in 
the starting configuration of 7, i.e. eo as eo,. We first show that a sufficient 
condition to prove that a CRDT is convergent is to show that any two events in 
its history either commute or are related by the global effector order. 


Lemma 1. Given an execution T € [Spy], and an event n = (id,0, 0s, 
A,,eo,) E L(r), if for all m,2 E€ A, such that m A no, either nfo n$ = n5 o n$ 
or €0,(71,72) or CO, (72,7), then n is convergent. 


3 All proofs can be found in the extended version [15] of the paper. 
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We now present a property that consistency policies must obey for our verifi- 
cation methodology to be soundly applied. First, we define the notion of behav- 
ioral equivalence of events: 


Definition 4 (Behavioral Equivalence). 
Two events m = (id1, 01,01, A1, €01) and nz = (idz, 02,02, A2, €02) are behav- 
torally equivalent if nj = n§ and 0, = 02. 


That is, behaviorally equivalent events produce the same effectors. We use 
the notation 7, = 72 to indicate that they are behaviorally equivalent. 


Definition 5 (Behaviorally Stable Consistency Policy). A consistency 
policy W is behaviorally stable if VA, vis,eo, A’, vis ,eo , n, n2 € A, Ts No en 
the following holds: 


(W(A, vis, eo) AW(A’, vis ,e0) Am = m An = m A vis(m,n2) & vis’ (m,n2)) 


=> eo(ņ, n2) & eo' (m, n2) 


(2) 


Behaviorally stable consistency policies treat behaviorally equivalent events 
which have the same visibility relation among them in the same manner by 
enforcing the same effector order. All consistency policies that we discussed in 
the previous section (representing the most well-known in the literature) are 
behaviorally stable: 


Lemma 2. EC, CC, PSI, RB and SC are behaviorally stable. 


EC does not enforce any effector ordering and hence is trivially stable behav- 
iorally. CC forces causally dependent events to be in the same order, and hence 
behaviorally equivalent events which have the same visibility order will be forced 
to be in the same effector order. RB forces events whose operations belong to a 
specific subset to be in the same order, but since behaviorally equivalent events 
perform the same operation, they would be enforced in the same effector order- 
ing. Similarly, PSI forces events writing to a common variable to be in the same 
order, but since behaviorally equivalent events generate the same effector, they 
would also write to the same variables and hence would be forced in the same 
effector order. SC forces all events to be in the same order which is equal to 
the visibility order, and hence is trivially stable behaviorally. In general, behav- 
iorally stable consistency policies do not consider the context in which events 
occur, but instead rely only on observable behavior of the events to constrain 
their ordering. A simple example of a consistency policy which is not behav- 
iorally stable is a policy which maintains bounded concurrency [12] by limiting 
the number of concurrent operations across all replicas to a fixed bound. Such 
a policy would synchronize two events only if they occur in a context where 
keeping them concurrent would violate the bound, but behaviorally equivalent 
events in a different context may not be synchronized. 

For executions under a behaviorally stable consistency policy, the global effec- 
tor order between events only grows in an execution, so that if two events 7, and 
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n2 are in the history of some event 77 are related by eoņ, then if they later occur 
in the history of any other event, they would be related in the same effector 
order. Hence, we can now define a common global effector order for an execu- 
tion. Given an execution T € [Spy], the effector order eo, C L(r) x L(r) is an 
anti-symmetric relation defined as follows: 


eo, = {(m, n2) | dn € L(T). (m,N2) € eon} 


Similarly, we also define vis, to be the common visibility relation for an 
execution 7, which is nothing but the vis relation in the final configuration of 7. 


Definition 6 (Commutative modulo Consistency Policy). Given a 
CRDT P, a behaviorally stable weak consistency specification © and an ese- 
cution T E€ [Sp w], two events m,n2 E€ L(T) such that m A n2 commute modulo 
the consistency policy W if either nf o n$ = n$ o ng or eor(m, n2) or eo+(n2, n). 


The following lemma is a direct consequence of Lemma 1: 


Lemma 3. Given a CRDT P and a behaviorally stable consistency specification 
W, if for allt E WF(Sp w), for all m, n2 € L(T) such that m A m, m and na 
commute modulo the consistency policy VW, then P achieves SEC under Y. 


Our goal is to use Lemma 3 to show that all events in any execution commute 
modulo the consistency policy. However, executions can be arbitrarily long and 
have an unbounded number of events. Hence, for events occurring in such large 
executions, we will instead consider behaviorally equivalent events in a smaller 
execution and show that they commute modulo the consistency policy, which by 
stability of the consistency policy directly translates to their commutativity in 
the larger context. Recall that the effector generated by an operation depends 
on its start state and the sequence of other effectors applied to that state. To 
generate behaviorally equivalent events with arbitrarily long histories in short 
executions, we summarize these long histories into the start state of events, and 
use commutativity itself as an inductive property of these start states. That is, 
we ask if two events with arbitrary start states and empty histories commute 
modulo W, whether the addition of another event to their histories would continue 
to allow them to commute modulo W. 


Definition 7 (Non-interference to Commutativity). (Non-Interf) A 
CRDT P = (2,0, init) satisfies non-interference to commutativity under a 
consistency policy ¥ if and only if the following conditions hold: 


1. For all executions Cint > Cy 2> Co in WF(Sp w), mı and n2 commute 
modulo W. 

2. For all 01,02,03 € X, if for execution T = Cinit m gi 2; Osin [Spe] 
where os(m) = 01, osn) = o2, m and n commute modulo W, then for 


all executions T! = Cint Ci m, C; n C; such that osm) = = 01, 


ooh) T = oa; E E = oa, and vise natn) © 
viss (n1, n2), n, and ny commute modulo Y. 
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Condition (1) corresponds to the base case of our inductive argument and 
requires that in well-formed executions with 2 events, both the events commute 
modulo W. For condition (2), our intention is to consider two events 7, and 
m with any arbitrary histories which can occur in any well-formed execution 
and, assuming that they commute modulo W, show that even after the addition 
of another event to their histories, they continue to commute. We use CRDT 
states 01,02 to summarize the histories of the two events, and construct behav- 
iorally equivalent events (7, = Na and 72 = m) which would take 01,02 as 
their start states. That is, if na produced the effector o(cinit, T)*, where o is the 
CRDT operation corresponding to ņa and 7 is the sequence of effectors in its 
history, we leverage the observation that o(init, T) = 0(7 (init), €), and assum- 
ing o1 = 7(Cinit), we obtain the behaviorally equivalent event 7, i.e. nf = nê. 
Similar analysis establishes that n% = ný. However, since we have no way of char- 
acterizing states o} and o2 which are obtained by applying arbitrary sequences 
of effectors, we use commutativity itself as an identifying characteristic, focusing 
on only those gı and g2 for which the events 7, and 72 commute modulo W. 

The interfering event is also summarized by another CRDT state o3, and 
we require that after suffering interference from this new event, the original two 
events would continue to commute modulo W. This would essentially establish 
that any two events with any history would commute modulo W in these small 
executions, which by the behavioral stability of WY would translate to their com- 
mutativity in any execution. 


Theorem 1. Given a CRDT P and a behaviorally stable consistency policy Y, 
if P satisfies non-interference to commutativity under Y, then P achieves SEC 
under W. 


Example: Let us apply the proposed verification strategy to the ORSet CRDT 
shown in Fig. 2. Under EC, condition (1) of Non-Interf fails, because in the exe- 
cution Cinit > C 2> Cz where o(m) =Add(a,i) and o(ņ2) =Remove(a) and 
vis(71, n2), nı and nz don’t commute modulo EC, since (a, i) would be present in 
the source replica of Remove (a). However, 7; and 72 would commute modulo CC, 
since they would be related by the effector order. Now, moving to condition (2) of 
Non-interf, we limit ourselves to source replica states cı and a2 where Add(a,i) 
and Remove(a) do commute modulo CC. If vis- (n, n2), then after interference, 
in execution 7’, vis; (h, n2), in which case A and m trivially commute modulo 
CC (because iey would be related by the effector order). On the other hand, 
if ~vis+(n1, n2), then for 7, and 72 to commute modulo CC, we must have that 
the effectors nf and 75 themselves commute, which implies that (a,i) ¢ oo. 
Now, consider any execution T with an interfering operation 73. If 73 is another 
Add(a,i’) operation, then i’ Æ i, so that even if it is visible to No noe will 
not remove (a,i), so that mn and No would commute. Similarly, if 73 is another 
Remove(a) operation, it can only remove tagged versions of a from the source 
replicas of No so that the effector no would not remove (a,i). 


4 Note that in a well-formed execution, the start state is always oinit. 
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5 Experimental Results 


In this section, we present the results of applying our verification methodology 
to a number of CRDTs under different consistency models. We collected CRDT 
implementations from a number of sources [1,19,20] and since all of the exist- 
ing implementations assume a very weak consistency model (primarily CC), we 
additionally implemented a few CRDTs on our own intended to only work under 
stronger consistency schemes but which are better in terms of time/space com- 
plexity and ease of development. Our implementations are not written in any 
specific language but instead are specified abstractly akin to the definitions given 
in Figs. 1 and 2. To specify CRDT states and operations, we fix an abstract lan- 
guage that contains uninterpreted datatypes (used for specifying elements of sets, 
lists, etc.), a set datatype with support for various set operations (add, delete, 
union, intersection, projection, lookup), a tuple datatype (along with operations 
to create tuples and project components) and a special uninterpreted datatype 
equipped with a total order for identifiers. Note that the set datatype used in 
our abstract language is different from the Set CRDT, as it is only intended to 
perform set operations locally at a replica. All existing CRDT definitions can be 
naturally expressed in this framework. 

Here, we revert back to the op-based specification of CRDTs. For a given 
CRDT P = (X, O, cinit), we convert all its operations into FOL formulas relat- 
ing the source, input and output replica states. That is, for a CRDT operation 
0o: X > X — Y, we create a predicate o : Xx Xx X > B such that 0(¢5, Ci, co) 
is true if and only if o(05)(a;) = co. Since CRDT states are typically expressed 
as sets, we axiomatize set operations to express their semantics in FOL. 

In order to specify a consistency model, we introduce a sort for events and 
binary predicates vis and eo over this sort. Here, we can take advantage of the 
declarative specification of consistency models and directly encode them in FOL. 
Given an encoding of CRDT operations and a consistency model, our verifica- 
tion strategy is to determine whether the Non-Interf property holds. Since both 
conditions of this property only involve executions of finite length (at most 3), 
we can directly encode them as UNSAT queries by asking for executions which 
break the conditions. For condition (1), we query for the existence of two events 
m and m along with vis and eo predicates which satisfy the consistency specifi- 
cation ¥ such that these events are not related by eo and their effectors do not 
commute. For condition (2), we query for the existence of events 7,172,173 and 
their respective start states 01,02,03, such that 7, and 72 commute modulo Y 
but after interference from 73, they are not related by eo and do not commute. 
Both these queries are encoded in EPR [18], a decidable fragment of FOL, so 
if the CRDT operations and the consistency policy can also be encoded in a 
decidable fragment of FOL (which is the case in all our experiments), then our 
verification strategy is also decidable. We write Non-Interf-1 and Non-Interf-2 for 
the two conditions of Non-Interf. 

Figure 4 shows the results of applying the proposed methodology on different 
CRDTs. We used Z3 to discharge our satisfiability queries. For every combination 
of a CRDT and a consistency policy, we write X to indicate that verification of 
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CRDT EC CC PSI+RB PSI Verif. Time 
(s) 
Set 
Simple-Set x x v v 0.23 
ORSet [20] x v v v 0.6 
ORSet with v v v v 0.04 
Tombstones 
USet/20] x x x v 0.1 
List 
RGA[1] x v v v 5.3 
RGA-No- x x v v 3 
Tomb 
Graph 
2P2P- x v v v 3.5 
Graph[20] 
Graph-with- x x v v 46.3 
ORSet 


Fig. 4. Convergence of CRDTs under different consistency policies. 


Non-Interf failed, while Y indicates that it was satisfied. We also report the 
verification time taken by Z3 for every CRDT across all consistency policies 
executing on a standard desktop machine. We have picked the three collection 
datatypes for which CRDTs have been proposed i.e. Set, List and Graph, and 
for each such datatype, we consider multiple variants that provide a tradeoff 
between consistency requirements and implementation complexity. Apart from 
EC, CC and PSI, we also use a combination of PSI and RB, which only enforce 
PSI between selected pairs of operations (in contrast to simple RB which would 
enforce SC between all selected pairs). Note that when verifying a CRDT under 
PSI, we assume that the set operations are implemented as Boolean assignments, 
and the write set Wr consists of elements added/removed. We are unaware of 
any prior effort that has been successful in automatically verifying any CRDT, 
let alone those that exhibit the complexity of the ones considered here. 


Set: The Simple-Set CRDT in Fig.1 does not converge under EC or CC, but 
achieves convergence under PSI+RB which only synchronizes Add and Remove 
operations to the same elements, while all other operations continue to run under 
EC, since they do commute with each other. As explained earlier, ORSet does not 
converge under EC and violates Non-Interf-1. ORSet with tombstones converges 
under EC as well since it uses a different set (called a tombstone) to keep track of 
removed elements. USet is another implementation of the Set CRDT which con- 
verges under the assumptions that an element is only added once, and removes 
only work if the element is already present in the source replica. USet converges 
only under PSI, because under any weaker consistency model, NON-INTERF-2 
breaks, since Add(a) interferes and breaks the commutativity of Add(a) and 
Remove(a). Notice that as the consistency level weakens, implementations need 
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to keep more and more information to maintain convergence-compute unique 
ids, tag elements with them or keep track of deleted elements. If the under- 
lying replicated store supports stronger consistency levels such as PSI, simpler 
definitions are sufficient. 


List: The List CRDT maintains a total ordering between its elements. It sup- 
ports two operations: AddRight (e,a) adds new element a to the right of existing 
element e, while Remove(e) removes e from the list. We use the implemen- 
tation in [1] (called RGA) which uses time-stamped insertion trees. To main- 
tain integrity of the tree structure, the immediate predecessor of every list 
element must be present in the list, due to which operations AddRight (a,b) 
and AddRight (b,c) do not commute. Hence RGA does not converge under EC 
because Non-Interf-1 is violated, but converges under CC. 

To make adds and removes involving the same list element commute, RGA 
maintains a tombstone set for all deleted list elements. This can be expensive as 
deleted elements may potentially need to be tracked forever, even with garbage 
collection. We consider a slight modification of RGA called RGA-No-Tomb which 
does not keep track of deleted elements. This CRDT now has a convergence 
violation under CC (because of Non-Interf-1), but achieves convergence under 
PSI+RB where we enforce PSI only for pairs of AddRight and Remove operations. 


Graph: The Graph CRDT maintains sets of vertices and edges and supports 
operations to add and remove vertices and edges. The 2P2P-Graph specification 
uses separate 2P-Sets for both vertices and edges, where a 2P-Set itself main- 
tains two sets for addition and removal of elements. While 2P sets themselves 
converge under EC, the 2P2P-Graph has convergence violations (to Non-Interf- 
1) involving AddVertex(v) and RemoveVertex(v) (similarly for edges) since it 
removes a vertex from a replica only if it is already present. We verify that it 
converges under CC. Graphs require an integrity constraint that edges in the 
edge-set must always be incident on vertices in the vertex-set. Since concurrent 
RemoveVertex(v) and AddEdge(v,v’) can violate this constraint, the 2P2P- 
Graph uses the internal structure of the 2P-Set which keeps track of deleted 
elements and considers an edge to be in the edge set only if its vertices are not 
in the vertex tombstone set (leading to a remove-wins strategy). 

Building a graph CRDT can be viewed as an exercise in composing CRDTs 
by using two ORSet CRDTs, keeping the internal implementation of the ORSet 
opaque, using only its interface. The Graph-with-ORSet implementation uses 
separate ORSets for vertices and edges and explicitly maintains the graph 
integrity constraint. We find convergence violations (to Non-Interf-1) between 
RemoveVertex(v) and AddEdge(v,v’), and RemoveVertex(v) and 
RemoveEdge(v,v’) under both EC and CC. Under PSI+RB (enforcing RB on 
the above two pairs of operations), we were able to show convergence. 

When a CRDT passes Non-Interf under a consistency policy, we can guar- 
antee that it achieves SEC under that policy. However, if it fails Non-Interf, it 
may or may not converge. In particular, if it fails Non-Interf-1 it will definitely 
not converge (because Non-Interf-1 constructs a well-formed execution), but if 
it passes Non-Interf-1 and fails Non-Interf-2, it may still converge because of 
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the imprecision of Non-Interf-2. There are two sources of imprecision, both con- 
cerning the start states of the events picked in the condition: (1) we only use 
commutativity as a distinguishing property of the start states, but this may not 
be a sufficiently strong inductive invariant, (2) we place no constraints on the 
start state of the interfering operation. In practice, we have found that for all 
cases except U-Set, convergence violations manifest via failure of Non-Interf-1. 
If Non-Interf-2 breaks, we can search for well-formed executions of higher length 
upto a bound. For U-Set, we were successful in adopting this approach, and were 
able to find a non-convergent well-formed execution of length 3. 


6 Related Work and Conclusions 


Reconciling concurrent updates in a replicated system is a important well-studied 
problem in distributed applications, having been first studied in the context of 
collaborative editing systems [17]. Incorrect implementation of replicated sets 
in Amazon’s Dynamo system [7] motivated the design of CRDTs as a princi- 
pled approach to implementing replicated data types. Devising correct imple- 
mentations has proven to be challenging, however, as evidenced by the myriad 
pre-conditions specified in the various CRDT implementations [20]. 

Burckhardt et al. [6] present an abstract event-based framework to describe 
executions of CRDTs under different network conditions; they also propose a 
rigorous correctness criterion in the form of abstract specifications. Their proof 
strategy, which is neither automated nor parametric on consistency policies, ver- 
ifies CRDT implementations against these specifications by providing a simula- 
tion invariant between CRDT states and event structures. Zeller et al. [24] also 
require simulation invariants to verify convergence, although they only target 
state-based CRDTs. Gomes et al. [9] provide mechanized proofs of convergence 
for ORSet and RGA CRDTs under causal consistency, but their approach is 
neither automated nor parametric. 

A number of earlier efforts [2, 10-12, 22] have looked at the problem of verify- 
ing state-based invariants in distributed applications. These techniques typically 
target applications built using CRDTs, and assume their underlying correctness. 
Because they target correctness specifications in the form of state-based invari- 
ants, it is unclear if their approaches can be applied directly to the convergence 
problem we consider here. Other approaches [4,5,16] have also looked at the ver- 
ification problem of transactional programs running on replicated systems under 
weak consistency, but these proposals typically use serializability as the correct- 
ness criterion, adopting a “last-writer wins” semantics, rather than convergence, 
to deal with concurrent updates. 

This paper demonstrates the automated verification of CRDTs under dif- 
ferent weak consistency policies. We rigorously define the relationship between 
commutativity and convergence, formulating the notion of commutativity mod- 
ulo consistency policy as a sufficient condition for convergence. While we require 
a non-trivial inductive argument to show that non-interference to commutativ- 
ity is sufficient for convergence, the condition itself is designed to be simple 
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and amenable to automated verification using off-the-shelf theorem-provers. We 
have successfully applied the proposed verification strategy for all major CRDTs, 
additionally motivating the need for parameterization in consistency policies by 
showing variants of existing CRDTs which are simpler in terms of implementa- 
tion complexity but converge under different weak consistency models. 


Acknowledgments. We thank the anonymous reviewers for their insightful com- 
ments. This material is based upon work supported by the National Science Founda- 
tion under Grant No. CCF-SHF 1717741 and the Air Force Research Lab under Grant 
No. FA8750-17-1-0006. 


References 


1. Attiya, H., Burckhardt, S., Gotsman, A., Morrison, A., Yang, H., Zawirski, M.: 
Specification and complexity of collaborative text editing. In: Proceedings of the 
2016 ACM Symposium on Principles of Distributed Computing, PODC 2016, 
Chicago, IL, USA, 25-28 July 2016, pp. 259-268 (2016). https://doi.org/10.1145/ 
2933057.2933090 

2. Bailis, P., Fekete, A., Franklin, M.J., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Coor- 
dination avoidance in database systems. PVLDB 8(3), 185-196 (2014). https://doi. 
org/10.14778/2735508.2735509. http://www.vldb.org/pvldb/vol8/p185-bailis.pdf 

3. Bailis, P., Ghodsi, A.: Eventual consistency today: limitations, extensions, and 
beyond. Commun. ACM 56(5), 55-63 (2013). https://doi.org/10.1145/2447976. 
2447992 

4. Bernardi, G., Gotsman, A.: Robustness against consistency models with atomic 
visibility. In: 27th International Conference on Concurrency Theory, CONCUR 
2016, 23-26 August 2016, Québec City, Canada, pp. 7:1—7:15 (2016). https: //doi. 
org/10.4230/LIPIcs. CONCUR.2016.7 

5. Brutschy, L., Dimitrov, D., Miiller, P., Vechev, M.T.: Static serializability analysis 
for causal consistency. In: Proceedings of the 39th ACM SIGPLAN Conference 
on Programming Language Design and Implementation, PLDI 2018, Philadelphia, 
PA, USA, 18-22 June 2018, pp. 90-104 (2018). https://doi-org/10.1145/3192366. 
3192415 

6. Burckhardt, S., Gotsman, A., Yang, H., Zawirski, M.: Replicated data types: spec- 
ification, verification, optimality. In: The 41st Annual ACM SIGPLAN-SIGACT 
Symposium on Principles of Programming Languages, POPL 2014, San Diego, CA, 
USA, 20-21 January 2014, pp. 271-284 (2014). https://doi-org/10.1145/2535838. 
2535848 

7. DeCandia, G., et al.: Dynamo: amazon’s highly available key-value store. In: Pro- 
ceedings of the 21st ACM Symposium on Operating Systems Principles 2007, SOSP 
2007, Stevenson, Washington, USA, 14-17 October 2007, pp. 205-220 (2007). 
https: //doi.org/10.1145/1294261.1294281 

8. Gilbert, S., Lynch, N.A.: Brewer’s conjecture and the feasibility of consistent, avail- 
able, partition-tolerant web services. SIGACT News 33(2), 51-59 (2002). https:// 
doi.org/10.1145/564585.564601. http://doi.acm.org/10.1145/564585.564601 

9. Gomes, V.B.F., Kleppmann, M., Mulligan, D.P., Beresford, A.R.: Verifying strong 
eventual consistency in distributed systems. PACMPL 1(OOPSLA), 109:1—109:28 
(2017). https: //doi.org/10.1145/3133933 


476 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


K. Nagar and S. Jagannathan 


Gotsman, A., Yang, H., Ferreira, C., Najafzadeh, M., Shapiro, M.: ‘Cause i’m 
strong enough: reasoning about consistency choices in distributed systems. In: Pro- 
ceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles 
of Programming Languages, POPL 2016, St. Petersburg, FL, USA, 20-22 Jan- 
uary 2016, pp. 371-384 (2016). https://doi.org/10.1145/2837614.2837625, http:// 
doi.acm.org/10.1145/2837614.2837625 

Houshmand, F., Lesani, M.: Hamsaz: replication coordination analysis and syn- 
thesis. PACMPL 3(POPL), 74:1-74:32 (2019). https://dl.acm.org/citation.cfm? 
id=3290387 

Kaki, G., Earanky, K., Sivaramakrishnan, K.C., Jagannathan, S.: Safe replication 
through bounded concurrency verification. PACMPL 2(OOPSLA), 164:1—-164:27 
(2018). https: //doi.org/10.1145/3276534 

Li, C., Porto, D., Clement, A., Gehrke, J., Preguiça, N.M., Rodrigues, R.: Mak- 
ing geo-replicated systems fast as possible, consistent when necessary. In: 10th 
USENIX Symposium on Operating Systems Design and Implementation, OSDI 
2012, Hollywood, CA, USA, 8-10 October 2012, pp. 265-278 (2012). https://www. 
usenix.org/conference/osdi12/technical-sessions/presentation/li 

Lloyd, W., Freedman, M.J., Kaminsky, M., Andersen, D.G.: Don’t settle for even- 
tual: scalable causal consistency for wide-area storage with COPS. In: Proceedings 
of the 23rd ACM Symposium on Operating Systems Principles 2011, SOSP 2011, 
Cascais, Portugal, 23-26 October 2011, pp. 401-416 (2011). https://doi.org/10. 
1145/2043556.2043593, http://doi.acm.org/10.1145/2043556.2043593 

Nagar, K., Jagannathan, S.: Automated Parameterized Verification of CRDTs 
(Extended Version). https://arxiv.org/abs/1905.05684 

Nagar, K., Jagannathan, S.: Automated detection of serializability violations under 
weak consistency. In: 29th International Conference on Concurrency Theory, CON- 
CUR 2018, 4-7 September 2018, Beijing, China, pp. 41:1—41:18 (2018). https:// 
doi.org/10.4230/LIPIcs.CONCUR.2018.41 

Nichols, D.A., Curtis, P., Dixon, M., Lamping, J.: High-latency, low-bandwidth 
windowing in the jupiter collaboration system. In: Proceedings of the 8th Annual 
ACM Symposium on User Interface Software and Technology, UIST 1995, Pitts- 
burgh, PA, USA, 14-17 November 1995, pp. 111-120 (1995). https://doi.org/10. 
1145/215585.215706 

Piskac, R., de Moura, L.M., Bjørner, N.: Deciding effectively propositional logic 
using DPLL and substitution sets. J. Autom. Reasoning 44(4), 401-424 (2010). 
https: //doi.org/10.1007/s10817-009-9161-6 

Preguiça, N.M., Baquero, C., Shapiro, M.: Conflict-free replicated data types 
(CRDTs). CoRR abs/1805.06358 (2018). http://arxiv.org/abs/1805.06358 
Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M.: A comprehensive study of 
Convergent and Commutative Replicated Data Types. Technical report RR-7506, 
INRIA, Inria - Centre Paris-Rocquencourt (2011) 

Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M.: Conflict-free replicated data 
types. In: Défago, X., Petit, F., Villain, V. (eds.) SSS 2011. LNCS, vol. 6976, pp. 
386-400. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24550- 
3.29 

Sivaramakrishnan, K.C., Kaki, G., Jagannathan, S.: Declarative programming 
over eventually consistent data stores. In: Proceedings of the 36th ACM SIG- 
PLAN Conference on Programming Language Design and Implementation, Port- 
land, OR, USA, 15-17 June 2015, pp. 413-424 (2015). https://doi.org/10.1145/ 
2737924.2737981 


Automated Parameterized Verification of CRDTs AT7 


23. Sovran, Y., Power, R., Aguilera, M.K., Li, J.: Transactional storage for geo- 
replicated systems. In: Proceedings of the 23rd ACM Symposium on Operating 
Systems Principles 2011, SOSP 2011, Cascais, Portugal, 23-26 October 2011, pp. 
385-400 (2011). https: //doi.org/10.1145/2043556.2043592, http://doi.acm.org/10. 
1145 /2043556.2043592 

24. Zeller, P., Bieniusa, A., Poetzsch-Heffter, A.: Formal specification and verification of 
CRDTs. In: Ábrahám, E., Palamidessi, C. (eds.) FORTE 2014. LNCS, vol. 8461, pp. 
33-48. Springer, Heidelberg (2014). https: //doi.org/10.1007/978-3-662-43613-4_3 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the 
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter’s Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 


ui 
a | 


Check for 
updates 


What’s Wrong with On-the-Fly Partial 
Order Reduction 


Stephen F. Siegel) © 


University of Delaware, Newark, DE, USA 
siegel@udel.edu 


Abstract. Partial order reduction and on-the-fly model checking are 
well-known approaches for improving model checking performance. The 
two optimizations interact in subtle ways, so care must be taken when 
using them in combination. A standard algorithm combining the two 
optimizations, published over twenty years ago, has been widely stud- 
ied and deployed in popular model checking tools. Yet the algorithm is 
incorrect. Counterexamples were discovered using the Alloy analyzer. A 
fix for a restricted class of property automata is proposed. 
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1 Introduction 


Partial order reduction (POR) refers to a family of model checking techniques 
used to reduce the size of the state space that must be explored when verifying 
a property of a program. The techniques vary, but all share the core observation 
that when two independent operations are enabled in a state, it is often safe to 
ignore traces that begin with one of them. A large number of POR techniques 
have been explored, differing in details such as the range of properties to which 
they apply. This paper focuses on ample set POR [4], an approach which applies 
to stutter-invariant properties and is used in the model checker Spin [8]. 

In the automata-theoretic view of model checking, the negation of the prop- 
erty to be verified is represented by an w-automaton. The basic algorithm com- 
putes the product of this automaton with the state space of the program. The 
language of the product is empty if and only if the program cannot violate the 
property. On-the-fly model checking refers to an optimization of this basic algo- 
rithm in which the enumeration of the reachable program states, computation of 
the product, and language emptiness check are interleaved, rather than occurring 
in sequence. 

These two optimizations must be combined with care, because they interact 
in subtle ways.! A standard algorithm for on-the-fly ample set POR is described 


1 Previous work, for example, has dealt with problems, distinct from those discussed 
in this paper, that arise when combining nested depth first search and POR [7,14]. 
© The Author(s) 2019 
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in [12] and in further detail in [13]. I shall refer to this algorithm as the combined 
algorithm. Theorem 4.2 of [13] asserts the soundness of the combined algorithm. 
A proof of the theorem is also given in [13]. 

The proof has a gap. This was pointed out in [16, Sect. 5], with details in 
[15]. The gap was rediscovered in the course of developing mechanized correctness 
proofs for model checking algorithms; an explicit counterexample to the incorrect 
proof step was also found ([2, Sect. 8.4.5] and [3, Sect. 5]). The fact that the 
proof is erroneous, however, does not imply the theorem is wrong. To the best 
of my knowledge, no one has yet produced a proof or a counterexample for the 
soundness of the combined algorithm. 

In this paper, I show that the combined algorithm is not sound; a counterex- 
ample is given in Sect. 3.1. I found this counterexample by modeling the com- 
bined algorithm in Alloy and using the Alloy analyzer [11] to check its soundness. 
Sect. 4 describes this model. Spin’s POR is based on the combined algorithm, 
and in Sect.5, Spin is seen to return an incorrect result on a Promela model 
derived from the theoretical counterexample. 

There is a small adjustment to the combined algorithm, yielding an algo- 
rithm that is arguably more natural and that returns the correct result on the 
previous counterexample; this is described in Sect. 6. It turns out this one is also 
unsound, as demonstrated by another Alloy-produced counterexample. However, 
in Sect. 7, I show that this variation is sound if certain restrictions are placed on 
the property automaton. 


2 Preliminaries 


Definition 1. A finite state program is a triple P = (T,Q,v), where Q is a 
finite set of states, « E€ Q is the initial state, and T is a finite set of operations. 
Each operation a € T is a function from a set eng CQ to Q. 


Fix a finite state program P = (T, Q, 1). 
Definition 2. For q € Q, define en(q) = {a ET |q € eng}. 


Definition 3. An execution of P is an infinite sequence of operations ayaz- 
that generates the sequence of states € = qoqiq2::: such that qq = ų and for 
i > 0, qi €eno,,, and G41 = Qi+ı (qi). An admissible sequence is any segment 
of an execution. 


Definition 4. A Biichi automaton is a tuple B = (S, A, 5,6, F}, where S is a 
finite set of automaton states, A C S is the set of initial states, X is a finite 
set called the alphabet, 6 C S x X x S is the transition relation, and F C S 
is the set of accepting states. The language of B, denoted L(B), is the set of 
all € € X” generated by infinite paths in B that pass through an accepting state 
infinitely often. 
Fix a finite set AP of atomic propositions and let X = 24°. 

Fix an interpretation mapping for P, i.e., a function L: Q > X. 
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Definition 5. The language of P, denoted L(P), is the set of all infinite words 
L(qo)L(qi)-:- E ©”, where qoqi::: is the sequence of states generated by an 
execution of P. 


Definition 6. A language L C X” is stutter-invariant if, for any ao, a1,... € X 
and positive integers i9,i1..., aoa1 =: E L & apga: E L, where a’ denotes 
the concatenation of i copies of a. 


Definition 7. Let B = (S, A, X, ô, F), be a Büchi automaton with alphabet X. 
The product of P and B is the Büchi automaton 


PB= (Qx 8S, {1} x A, T x X, ðg,Q x F), 
where 


da = {((9, 8), (a, 0), (q, 5')) | o = L(a) A (s,0,8') € 8A q = a(q)}- 


Note 1. A transition from product state x = (q,s) can be viewed as taking 


L i i 
place in two steps. First, a transition s a s’ in B executes, leading to an 


“intermediate state” x’ = (q,s’). Then a program transition q  q’ executes, 
culminating in y = (q’,s’). While this is a good mental model, the product 
automaton does not necessarily contain a transition from x to x’ or from a’ to y. 
The intermediate state x’ is not even necessarily reachable in the product. The 
transition in the product goes directly from x to y with label (a, L(q)). 


It is well-known that 
L(P)NL(B) =0e L(P ® B) = 9. 


In the context of model checking, B is used to represent the negation of a desir- 
able property; the program P satisfies the property if, and only if, no execution 
of P is accepted by B, i.e., £(P) N L(B) = 0. The automaton B may be generated 
from a (negated) LTL formula, but that assumption is not needed here. 

The goal of “offline” (not on-the-fly) partial order reduction is to generate 
some subspace P’ of P with the guarantee that 


L(P') N L(B)=0 < L(P) Nn L(B)=0 
The emptiness of £(P’ & B) = L(P') N L(B) can be decided in various ways, 
such as a nested depth first search (NDFS) [5]. 


3 On-the-Fly Partial Order Reduction 


In on-the-fly model checking, the state space of the product automaton is enu- 
merated directly, without first enumerating the program states. Adding POR 
to the mix means that at each state reached in the product automaton, some 
subset of enabled transitions will be explored. The goal is to ensure that if the 
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language of the full product automaton is nonempty, then the language of the 
resulting reduced automaton must be nonempty. 

To make this precise, fix a finite state program P = (T,Q,t), a set AP of 
atomic propositions, an interpretation L: Q > X = 24°, and Biichi automaton 
B = (S5, A, X, ô, F). Let A= P@B. 


Definition 8. A function amp: Q x S — 27 is an ample selector if amp(q, s) C 
en(q) for allq € Q,s € S. Each amp(q, s) is an ample set. 


An ample selector determines a subautomaton A’ = reduced( A, amp) of A: 
A’ is defined exactly as in Definition 7, except that the transition relation has 
the additional restriction that a € amp(q, s’): 


A =(Q xS, {i} x A, Tx X, Fð, Qx F) (1) 


5’ = {((4, 8), (a, 0), (q',s')) € (Q x 8) x (T x X) x (Q x S) | 
o = L(q) A (s,0,8') E€ 8 Aa € amp(q,s’) Ag =a(q)}. 


(2) 
Definition 9. An ample selector amp is POR-sound if the following holds: 
L(reduced( A, amp)) = 0 = L(P) N L(B) = 0. 


The goal is to define some constraints on an ample selector that guarantee 
it is POR-sound. Before stating the constraints, we need two more concepts: 


Definition 10. An independence relation is an irreflexive and symmetric rela- 
tion I CT x T satisfying the following: if (a, 8) € I and q € eng Meng, then 


a(q) € eng, B(q) € ena, and a(B(q)) = B(a(q)). 
Fix an independence relation J. We say a and @ are dependent if (a, 3) ¢ I. 


Definition 11. An operation a € T is invisible with respect to L if, for all 
q € eng, L(q) = L(a(q)). 


Note 2. The definition in [13] is slightly different. Given an LTL formula ¢ over 
AP, let AP’ be the set of atomic propositions occurring syntactically in ¢. The 
definition in [13] says a is invisible in ¢ if, for all p € AP’ and q € ena, p € 
L(q) & p € L(a(q)). However, there is no loss of generality using Definition 11, 
since one can define a new interpretation L’: Q > 24?’ by L'(q) = L(q) NAP’. 
Then a is invisible for ¢ if, and only if, æ is invisible with respect to L’, and the 
results of this paper can be applied without modification to P, AP’, and L’. 


We now define the following constraints on an ample selector amp:? 


CO For all gE Q, s€ S: en(q) 40 => amp(q,s) £ 0. 


? I am using the numbering from [4]. In [13], C2 and C3 are swapped. 
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C1 For all q € Q, s € S: in any admissible sequence in P starting from q, no 
operation in T \ amp(q, s) that is dependent on an operation in amp(q, s) can 
occur before some operation in amp(q, s) occurs. 

C2 For all q E€ Q, s € S: if amp(q, s) 4 en(q) then Va € amp(q, s), œ is invisible. 

C3 There is a depth-first search of A’ = reduced(A,amp) with the following 
property: whenever there is a transition in A’ from a node (q, s) on the top 
of the stack to a node (q’,s’) on the stack, amp(q, s’) = en(q). 


Condition C3 is the interesting one. The combined algorithm of [13] enforces 
it using a DFS (the outer search of the NDFS) of the reduced space and the 
following protocol: given a new state (q,s) that has just been pushed onto the 
stack, first iterate over all Biichi transitions (s, L(q), s’) departing from s and 
labeled by L(q). For each of these, a candidate ample set for amp(q, s’) that 
satisfies the first three conditions is computed; this computation does not depend 
on s’. If any operation in that candidate set leads back to a state on the search 
stack (a “back edge”), a different candidate is tried and the process is repeated 
until a satisfactory one is found. If no such candidate is found, en(q) is used for 
the ample set. 

Hence the process for choosing the ample set depends on the current state of 
the search. If yı 4 y2, it is not necessarily the case that amp(z, y1) = amp(z, y2), 
because it is possible that when (x, y1} was encountered, a back edge existed for 
a candidate, but when (a, y2) was encountered, there was no back edge. 


3.1 Counterexample 


Theorem 4.2 of [13] can be expressed as follows: if £(B) is stutter-invariant and 
the language of an LTL formula, and amp satisfies CO—C3, then amp is POR- 
sound. 


x 
aÇ Ajo DOL ET 


amp| 0 1 
a A Ha a 
DOS Wa a, (p) 


Fig. 1. Counterexample to combined theorem. Left: program and interpretation. Cen- 
ter: property automaton 6; and ample selector function. Right: the reachable product 
state space; dashed edges are in the full, but not reduced, space. 


A counterexample to this claim is given in Fig. 1. The program consists of two 
states, A and B, and two operations, a and 8. There is a single atomic proposi- 
tion, p, which is false at A and true at B. Note that a and 8 are independent. 
Also, æa is invisible, and £ is not. 
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The property automaton, B1, is shown in Fig. 1 (center top). It has two states, 
numbered 0 and 1. State 1 is the sole accepting state. The language consists of 
all infinite words of the following form: a finite nonempty prefix of @s followed 
by an infinite sequence of {p}s. This language is stutter-invariant, and is the 
language of the LTL formula (=p) A ((>p)U Gp). 

The ample selector is specified by the table (center bottom). Notice that 
amp(A,1) 4 en(A), but the other three ample sets are full. CO holds because 
the ample sets are never empty. C1 holds because 8 is independent of a. C2 
holds because a is invisible. The reachable product space is shown in Fig. 1 
(right). In any DFS of reduced(A,amp), the only back edge is the self-loop on 
AO labeled (a, @). Since amp(A,0) is full, C3 holds. Yet there is an accepting 
path in the full space, but not in the reduced space. 


4 Alloy Models of POR Schemes 


Alloy is a “lightweight formal methods” language and tool. It has been used 
in a wide variety of contexts, from exploring software designs to studying weak 
memory-consistency models. An Alloy model specifies signatures, each of which 
defines a type, relations on signatures, and constraints on the signatures and 
relations. Constraints are expressed in a logic that combines elements of first 
order logic and relational logic, and includes a transitive closure operator. An 
instance of a model assigns a finite set of atoms to each signature, and a finite set 
of tuples (of the right type) to each relation, in such a way that the constraints 
are satisfied. The Alloy analyzer can be used to check that an assertion holds 
on all instances in which the sizes of the signatures are within some specified 
bounds. The analyzer converts the question of the validity of the assertion into 
a SAT problem and invokes a SAT solver. Based on the result, it reports either 
that the assertion holds within the given bounds, or it produces an instance of 
the model violating the assertion. 

I developed an Alloy model to search for counterexamples to various POR 
claims, such as the one in Sect. 3.1. The model encodes the main concepts of the 
previous two sections, including program, operations, interpretation, invisibility 
and independence, property automaton, the product space, ample selectors and 
the constraints on them, and a language emptiness predicate. The model cul- 
minates in an assertion which states that an ample selector satisfying the four 
constraints is POR-sound. 

I was not able to find a way to encode stutter-invariance. In the end, I 
developed a small set of Büchi automata based on my own intuition of what 
would make interesting tests. I encoded these in Alloy and used the analyzer to 
explore all possible programs and ample selectors for each. 

The first part of the model is a simple encoding of a finite state automaton. 
The following is a listing of file ba. als: 


1 module ba -- module for simple model of Büchi automata 
2 sig Sigma {} -- alphabet of BA, valuation on atomic props 
3 sig BState {} -- a state in the Btichi Automaton 
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one sig Binit extends BState {} -- initial state of BA 
sig AState in BState {} -- accepting states of BA 
-- a transition has a source state, label, and destination state... 


a no fF 


7 sig BTrans { src: one BState, label: one Sigma, dest: one BState } 


The alphabet is some unconstrained set Sigma. The set of states is represented 
by signature BState. There is a single initial state, and any number of accepting 
states. Each transition has a source and destination state, and label. Relations 
declared within a signature declaration have that signature as an implicit first 
argument. So, for example, src is a binary relation of type BTrans x BState. 
Furthermore, the relation is many-to-one: each transition has exactly one BState 
atom associated to it by the src relation. 
The remaining concepts are incorporated into module por_v0: 


1 module por_vO -- on-the-fly POR variant 0, corresponding to [13] 

2 open ba -- import the Büchi automata module 

3 sig Operation {} -- program operation 

4 sig PState { -- program state 

5 label: one Sigma, -- the set of propositions which hold in this state 

6 enabled: set Operation, -- the set of all operations enabled at this state 
7 nextState: enabled -> one PState, -- the next-state function 

8 ample: BState -> set Operation -- ample(q,s) 

9 }{ all s: BState | ample[s] in enabled } -- ample sets subsets of enabled 
o fun amplq: PState, s: BState] : set Operation { q.ample[s] } 

1 one sig Pinit extends PState {} -- initial program state 

2 fact { -- all program states are reachable from Pinit 

3 let r = {q, q’: PState | some op: Operation | q.nextState[op]=q’} | 
4 PState = Pinit.*r 

5 } 

6 sig ProdState { -- state in the product of program and property automaton 

7 pstate: PState, -- the program state component 

8 bstate: BState, -- the property state component 

9 nextFull: set ProdState, -- all next states in the full product space 

20 nextReduced: set ProdState -- all next states in the reduced product space 
21 } 

22 one sig ProdInit extends ProdState {} -- initial product state 

23 pred transitionInProduct[q,q’: PState, op: Operation, s,s’: BState] { 
24 q->op->q’ in nextState 

25 some t : BTrans | t.src = s and t.dest = s’ and t.label = q.label 
26 } 

27 pred nextProd[x: ProdState, op: Operation, x’: ProdState] { 

28 transitionInProduct[x.pstate, x’.pstate, op, x.bstate, x’.bstate] 
29 } 

30 pred independent [op1, op2 : Operation] { 

31 all q: PState | (opit+top2 in q.enabled) implies ( 

32 op2 in q.nextState[op1].enabled and 

33 opi in q.nextState[op2].enabled and 

34 q.nextState[op1] .nextState[op2] =q.nextState[op2] .nextState[op1]) 
35 } 


36 pred invisible[op: Operation] { 


59 
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all q: PState | op in q.enabled => q.nextState[op].label = q.label 
} 
fact CO { all q: PState, s: BState | some q.enabled => some amp[q,s] } 
fact C1 { 
all q: PState, s: BState | let A=amp[q,s] | 
let r = { q1, q2: PState | some op: Operation-A | 
qi->op->q2 in nextState } | 
all q’: q.*r, opi: q’.enabled-A, op2: A | independent [op1, op2] 
} 
fact C2 { 
all q: PState, s: BState | let A = amplq,s] | 
A != q.enabled implies all op: A | invisible[op] 
} 
fact C3’ { 
let r = { x, x’ : ProdState | x->x’ in nextReduced and 
amp[x.pstate, x’.bstate] != x.pstate.enabled } | 
no x: ProdState | x in x.“r 
} 
fact { -- generate all reachable product states, etc. 
nextFull = {x,y: ProdState | some op: Operation | nextProd[x,op,y]} 
nextReduced = {x,y: ProdState | 
some op: amp[x.pstate, y.bstate] | nextProd[x,op,y]} 
ProdState = ProdInit.*nextFull 
all x,y: ProdState | (x.pstate=y.pstate && x.bstate=y.bstate) => x=y 
ProdInit.pstate = Pinit and ProdInit.bstate = Binit 
all x: ProdState, op: Operation, q’: PState, s’: BState | 
transitionInProduct[x.pstate, q’, op, x.bstate, s’] implies 
some y: ProdState | y.pstate = q’ and y.bstate = s’ 
} 
pred nonemptyLang[r: ProdState->ProdState] { -- r reaches accepting cycle 
some x: ProdInit.*r | (x.bstate in AState and x in x.“r) 


} 

assert PORsoundness { -- if full space has a lasso, so does the reduced 
nonemptyLang[nextFull] => nonemptyLang[nextReduced] 

} 


The facts are constraints that any instance must satisfy; some of the facts are 
given names for readability. A pred declaration defines a (typed) predicate. 


Most aspects of this model are self-explanatory; I will comment only on the 


less obvious features. The relations nextFull and nextReduced represent the 
next state relations in the full and reduced spaces, respectively. They are declared 
in ProdState, but specified completely in the final fact on lines 56-58. Strictly 
speaking, one could remove those predicates and substitute their definitions, but 
this seemed more convenient. Line 60 asserts that a product state is determined 
uniquely by its program and property components. Line 61 specifies the initial 
product state. 
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Line 59 insists that only states reachable (in the full space) from the initial 
state will be included in an instance (* is the reflexive transitive closure oper- 
ator). Lines 62-64 specify the converse. Hence in any instance of this model, 
ProdState will consist of exactly the reachable product states in the full space. 

The encoding of C1 is based on the following observation: given q E€ Q and 
a set A of operations enabled at q, define r C Q x Q by removing from the 
program’s next-state relation all edges labeled by operations in A. Then “no 
operation dependent on an operation in A can occur unless an operation in A 
occurs first” is equivalent to the statement that on any path from q using edges 
in r, all enabled operations encountered will either be in A or independent of 
every operation in A. 

Condition C3 is difficult to encode, in that it depends on specifying a depth- 
first search. I have replaced it with a weaker condition, which is similar to a 
well-known cycle proviso in the offline theory: 


C3’ In any cycle in reduced(A, amp), there is a transition from (q, s} to (q', s’) 
for which amp(q, s’) = en(q). 


Equivalently: if one removes from the reduced product space all such transitions, 
then the resulting graph should have no cycles. This is the meaning of lines 50-54 
(^ is the strict transitive closure operator). 

The next step is to create tests for specific property automata. This example 
is for the automaton 5; of Fig. 1: 


1 module bal 

2 open ba 

3 one sig XO, X1 extends Sigma {} 

4 one sig B1 extends BState {} 

5 one sig T1, T2, T3 extends BTrans {} 

6 fact { 

7 AState = B1 -- B1 is the sole accepting state 
8 Ti.src=Binit && Ti.label=X0 && T1i.dest=Binit 
9 T2.src=Binit && T2.label=X0 && T2.dest=Bi 
10 T3.src=B1 && T3.label=X1 && T3.dest=B1 

11 } 


The final step is a test that combines the modules above: 


1 open por_v0 

2 open bai 

3 checkPORsoundness for exactly 2 Sigma, exactly 2 BState, 
4 exactly 3 BTrans, 2 Operation, 2 PState, 4 ProdState 


It places upper bounds on the numbers of operations, program states, and prod- 
uct states while checking the soundness assertion. Using the Alloy analyzer to 
check the assertion above results in a counterexample like the one in Fig. 1. The 
runtime is a fraction of a second. The Alloy instance uses two uninterpreted 
atoms for the elements of Sigma; I have simply substituted the sets @ and {p} 
for them to produce Fig. 1. As we have seen, this counterexample happens to 
also satisfy the stronger constraint C3. 
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The POR algorithm used by Spin is described in [10] and is similar to the 
combined algorithm. We can see what Spin actually does by encoding examples 
in Promela and executing Spin with and without POR. 


bit p = 0; 
active proctype pO() { p=1 } 
active proctype p1() { bit x=0; do :: x=0 od } 


never { 
BO: do :: !p :: !p -> break od 
accept_B1: do :: p od 

} 


Fig. 2. Promela representation of counterexample using B, of Fig. 1 


Figure 2 shows an encoding of the example of Fig. 1. Transition a corresponds 
to the assignment x = 0, where x is a variable local to p1. Transition 8 corre- 
sponds to the assignment p = 1, where p is a shared variable. Applying Spin 
with the following commands allows one to see the structure of the program 
graphs for each process, as well as each step in the search of the full space: 


spin -a testi.pml; cc -o pan -DCHECK -DNOREDUCE pan.c; ./pan -d; ./pan -a 


I did this with Spin version 6.4.9, the latest stable release. The output indicates 
that 4 states and 5 transitions are explored, and one state is matched—exactly 
as in Fig. 1 (right). As expected, the output also reports a violation—a path to 
an accepting cycle that corresponds to the transition from AO to B1 followed by 
the self-loop on B1 repeated forever. 

Repeat this experiment without the -DNOREDUCE, however, and Spin finds no 
errors. The output indicates that it misses the transition from AO to B1. 


6 Ignoring the Intermediate States 


An interesting aspect of the combined algorithm is that the ample set is a func- 
tion of an intermediate state. I.e., given a product state x = (q, s}, the ample set 
is determined by the intermediate state x’ = (q,s’) obtained after executing a 
property transition. This introduces a difference between the on-the-fly scheme 
and offline schemes, where there is no notion of intermediate state. It also intro- 
duces other complexities. For example, it is possible that x’ was reached earlier 
in the search through some other state (q, $2), because of a property transition 


L 
52 Ho, s’. How does the algorithm guarantee that the ample set selected for z’ 


will be the same as the earlier choice? This issue is not addressed in [13] or [10]. 
These problems go away if one simply makes the ample set a function of 
the source product state x. The intermediate states do not have to play a role. 
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Specifically, given an ample selector amp, define reduced2( 4, amp) as in (1) and 
(2), except replace “a € amp(q,s’)” in (2) with “a € amp(q,s)”. Perform the 
same substitution in C3 and call the resulting condition C3,. The weaker version 
of C3, is simply: 


C31 In any cycle in reduced2(A,amp) there is a state (g,s) with amp(q,s) = 
en(q). 


Conditions C0-C2 are unchanged. I refer to this scheme as V1, and to the 
original combined algorithm as V0. The Alloy model of VO in Sect.4 can be 
easily modified to represent V1. 

Using V1, the example of Fig. 1 is no longer a counterexample. In fact, Alloy 
reports there are no counterexamples using B,, at least for small bounds on the 
program size. Figure 5 gives detailed results for this and other Alloy experiments. 

Unfortunately, Alloy does find a counterexample for a slightly more compli- 
cated property automaton, Bz, which is shown in Fig. 3. 


Fig. 3. Counterexample to V1 with Bz (center). AO and A2 have proper ample set {a}. 


The program is the same as the one in Sect.3.1. Automaton By has four 
states, with state 3 the sole accepting state. The language is the same as that 
of 5: all infinite words formed by concatenating a finite nonempty prefix of Øs 
and an infinite sequence of {p}s. If the prefix has odd length, the accepting run 
begins with the transition 0 — 1, otherwise it begins with the transition 0 — 2. 

In the ample selector, only AO and A2 are not fully enabled: 


amp| 0 1 2 3 
A |{a} {a, 8} {a} {a,b} 
B \{a} {a} {a} {a}. 


C0-C2 hold for the reasons given in Sect. 3.1. C31 holds for any DFS in which 
A2 is pushed onto the stack before A1. In that case, there is no back edge from 
A2; there will be a back edge when A1 is pushed, but A1 is fully enabled. 
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7 What’s Right 


In this section, I show that POR scheme V1 of Sect.6 is sound if one intro- 
duces certain assumptions on the property automaton. The following definition 
is similar to the notion of stutter invariant (SI) automaton in [6] and to that 
of closure under stuttering in [9]. The main differences derive from the use of 
Muller automata in [6] and Biichi transition systems in [9], while we are dealing 
with ordinary Büchi automata. 


Definition 12. A Biichi automaton B = (S, {sini}, 0,6, F}, is in SI normal 
form if it has a single initial state Sini with no incoming edges, and for each 
s ES \ {Sini}, there is some as € X such that the following all hold: 


1. Every edge terminating in s is labeled as. 

2. s has exactly one outgoing edge with label as. 

3. Ifs gF then (s,a5,8) € ô. 

4. If (s,as,8) ¢ 6, then there exists st € S \ F such that (i) (s,as,s*) € 6 and 
(ü) for alla € X and s' € S, (s,a,s') €6 & (st, a,s') € Ô. 


Lemma 1. Let B be a Büchi automaton in SI normal form. Suppose a,b € X 
and a# b. Both of the following hold: 


a b n P a a b 5 
1. If sı > s2 > s3 is a path in B, then for some sh € S, sı > s2 > sh > s3 is 


a path in B. 


a a b P y a b b $ 
2. If sı > s2 > s3 > s4 is a path in B, then sı > s2 —> s4 is a path in B. 
Moreover, if s3 is accepting, then s2 is accepting. 


Following the approach of [6], one can show that the language of an automa- 
ton in SI normal form is stutter-invariant. Moreover, any Büchi automaton with 
a stutter-invariant language can be transformed into SI normal form without 
changing the language. The conversion satisfies |S’| < O(|X||S]), where |S| and 
|S’| are the number of states in the original and new automaton, respectively. 
For details and proofs, see [17]. An example is given in Fig. 4; the language of 
B3 (or B4) consists of all words with a finite number of {p}s. 


{p} 


Fig. 4. Property automaton Bs and result of transformation to SI normal form, 54. 


Theorem 1. Suppose B is in SI normal form and amp: Q x S — 27 is an ample 
selector satisfying CO-C2 and C3. Then amp is POR-sound. 
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The remainder of this section is devoted to the proof of Theorem 1. The proof 
is similar to the proof of the offline case in [4]. 

Let 0 be an accepting path in the full space A. An infinite sequence of accept- 
ing paths mo, m1, ... will be constructed, where mo = 0. For each i > 0, m; will be 
decomposed as 7;06;, where n; is a finite path of length i in the reduced space, 6; 
is an infinite path, 7; is a prefix of 7:41, and o denotes concatenation. For i = 0, 
no is empty and ĝo = 8. 

Assume i > 0 and we have defined n; and 0; for j < i. Write 


(1,00) (a2,01) 
> 


(3) 


where op = L(q,) for k > 0. Then 7,4; and 6;41 are defined as follows. Let 
A = amp(qo, so). There are two cases: 


i = (qo, 80) (q1, s1) 


Case 1: a, € A. Let ni+ı be the path obtained by appending the first transition 
of 6; to ni, and 6;41 the path obtained by removing the first transition from 6. 


Case 2: a, ¢ A. Then there are two sub-cases: 


Case 2a: Some operation in A occurs in 6;. Let n be the index of the first 
occurrence, so that a, € A, but a; ¢ A for 1 < j < n. By C1, a; and ay 
are independent for 1 < j < n. By repeated application of the independence 
property, there are paths in P 


a1 a2 a3 An-2 An-1 
qo > qı > q2 ieee > Gn—2 > Qn—-1 
Je n fe n le n Je n |e n 
1 Q1 1 ag 1 ag An—2 1 Qn—1 Qn+1 An+2 
qı > G2 > 43 Poot > n-1 > dn > n41 


By C2, a, is invisible, whence L(a) = gj for 0 < j < n— 2, and on—1 = Op. 
Hence the admissible sequence 


Qn 7 A 7 AQ 7 An-2 7 An-1 An+1 An+2 4 
qo > qi > 9% > 93 > Qn-1 > Gn > qayi Z Mnp >: (4) 


generates the word 


00000102 ':'On—20n0n+10n+2'':. (5) 


Now the projection of 0; onto B has the form 


00 O1 02 On-2 On On On+1 On+2 
50 $1 $2 oa Sn—1 Sn Sn+1 Sn+2 


since On—1 = Cn. By Lemma 1, there is a path in B 


To o0 r Gi O2 On—2 Oy, On+1 On4+2 
So — $1 —> 81 —> 82 + Sn-1 Sn Bagg —— (6) 


which accepts the word (5). Composing (4) and (6) therefore gives a path through 
the product space. Removing the first transition (labeled (a,,,09)) from this path 
yields 0:41. Appending that transition to 7; yields nj;+41. 


What’s Wrong with On-the-Fly Partial Order Reduction 491 


Case 2b: No operation in A occurs in 6;. By CO, A is nonempty. Let G € A. By 
C2, every operation in 0; is independent of @. With an argument that is similar 
to the one for Case 2a, we can see there is a path in the product space for which 
the projection onto the program component has the form 


and the projection onto the property component has the form 


To To 1 T1 T2 
SO S1 Sy S2 


Removing the first transition from this path yields 0;,,. Appending that tran- 
sition to 7; yields ņ;+ı. This completes the definitions of n;41, and 6;41. 

Let 7 be the limit of the n. Clearly 7 is an infinite path through the reduced 
product space, starting from the initial state. We must show that it passes 
through an accepting state infinitely often. To do so, we must examine more 
closely the sequence of property states through which each 6; passes. 

Let i > 0, and so the final state of 7;. Say 6; passes through states sos ,s52---. 
Then the final state of 7,41 will be sı, and the state sequence of 0;1; is deter- 
mined by the three cases as follows: 


Case 1: s152- 
Case 2a: 5184$2°+* SnSn42°°° (Sn41 EF = > sn EF) (7) 
Case 2b: 515) 52°°: 


We first claim that for all 7 > 0, 8; passes through an accepting state infinitely 
often. This holds for 09, which is an accepting path by assumption. Assume it 
holds for 6;. In each case of (7), we see that the state sequence of 6:1; has a 
suffix which is a suffix of the state sequence of 0;, so the claim holds for 6;+1. 


Definition 13. For any path € = so —> sı —> --- through B which passes through 
an accepting state infinitely often, define the accepting distance of €, written 
AD(€), to be the minimum k > 1 for which sp is accepting. 


Lemma 2. Let i> 0 and say the state sequence of 0; is s98182:--. If sı is not 
accepting then one of the following holds: 


- Case 1 holds and AD(0@:+1) < AD(@;), or 
— Case 2a or 2b holds and AD(@;41) < AD(0;). 


Proof. If sı is not accepting then there is some k > 2 for which sẹ is accepting. 
The result follows by examining (7). In Case 1, the accepting distance decreases 
by 1. In Case 2a, the accepting distance is either unchanged (if k < n) or 
decreases by 1 (if k > n). In Case 2b, the accepting distance is unchanged. 
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Lemma 3. For an infinite number of i > 0, Case 1 holds for 6;. 


Proof. Suppose not. Then there is some į > 0 such that Case 2 holds for all 
j >i. Let a, be the first program operation of 6;. Then a, is the first program 
operation of 0;, for all j > i. Furthermore, for all j > 7, a; is not in the ample 
set of the final state of nj. Since the product space has only a finite number of 
states, this means there is a cycle in the reduced space for which a, is enabled 


but never in the ample set, contradicting C3}. 


We now show that 7 passes through an accepting state infinitely often. Note 
that, if AD(@;) = 1, an accepting state is added to 7; to form 7;+1. Suppose 7 does 
not pass through an accepting state infinitely often. Then there is some i > 0 
such that for all j > i, AD(0;) > 1. By Lemma 2, (AD(0;))j>; is a nonincreasing 
sequence of positive integers, and by Lemma 3, this sequence strictly decreases 
infinitely often, a contradiction. This completes the proof of Theorem 1. 


Remark 1. The proof goes through with minor modifications for VO in place of 
V1. Let A = amp(qo, 81) instead of amp(qo, so). In Case 2a (similarly in 2b), note 
the first transition sy => sı in the path in B remains in the new path (6). 


8 Summary of Experimental Results and Conclusion 


We have seen that standard ways of combining POR and on-the-fly model check- 
ing are unsound. This is not only a theoretical issue—the defect in the algorithm 
is realized in Spin, which can produce an incorrect result. A modification (V1) 
seems to help, but is still not enough to guarantee soundness for any Biichi 
automaton with a stutter-invariant language. However, any such automaton can 
be transformed into a normal form for which algorithm V1 is sound. 


v BA Sigma BState BTrans Operation PState ProdState time (s) Result 


VO By 2 2 3 <2 <2? <4 03 X 
V1 Bı 2 2 3 <3 <5 <10 423 of 
VO Be 2 4 6 <2 <2 <6 04 xX 
V1 Be 2 4 6 <2 <2 <6 03 xX 
VO Bs 2 2 4 <3 <5 <10 2563 / 
Vi B 2 2 4 <3 <5 <10 207 Vv 
VO Ba 2 4 9 <3 <A <16 395 Vv 
V1 Bı 2 4 9 <35 <4 <16 377 Z 
VO B <3 <4 <6 <3 <4 <16 22649 Vv 
Vi B5 <3 <4 <6 <3 <4 <16 16539 Vv 


Fig. 5. Bounded verification of soundness of POR schemes VO and V1 on various Biichi 
automata using Alloy. Bs represents all automata in SI normal form within the bounds. 
Each run resulted in either a counterexample (X) or not (VW). 


Alloy proved useful for reasoning about the algorithms and generating small 
counterexamples. A summary of the Alloy experiments and results is given in 
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Fig. 5. These were run on an 8-core 3.7GHz Intel Xeon W-2145 and used the plin- 
geling SAT solver [1].? In addition to the experiments already discussed, Alloy 
found no soundness counterexamples for property automata Bs or B4, using VO 
or V1. In the case of By, this is what Theorem 1 predicts. For further confir- 
mation of Theorem 1, I constructed a general Alloy model of Biichi automata 
in SI normal form, represented by 6s in the table. Alloy confirms that both VO 
and V1 are sound for all such automata within small bounds on program and 
automata size. 

It is possible that the use of the normal form, while correct, cancels out the 
benefits of POR. A comprehensive exploration of this issue is beyond the scope 
of this paper, but I can provide data on one non-trivial example. I encoded 
an n-process version of Peterson’s mutual exclusion algorithm in Promela, and 
used Spin to verify starvation-freedom for one process in the case n = 5. If p is 
the predicate that holds whenever the process is enabled, a trace violates this 
property if p holds only a finite number of times in the trace, i.e., if the trace 
is in £(B3) = £(B4). Figure6 shows the results of Spin verification using Bs; 
without POR, and using B3 and B4 with POR. The results indicate that POR 
significantly improves performance on this problem, and that using the normal 
form B, in place of B, actually improves performance further by a small amount. 


BA POR | states(stored) transitions time(s) Result 


B3 N 18,964,912 116,510,960 25.8 v 
B3 Y 4,742,982 13,823,705 3.6 y 
Ba Y 4,719,514 12,503,008 3.4 y 


Fig. 6. Spin verification of starvation-freedom for 5-process Peterson. Using the SI 
normal form 6, instead of the smaller B3 has little impact on performance. 


It is likely that V1 is sound for other interesting classes of automata. Observe, 
for example, that Bz of Fig. 3 has states u where the language of the automaton 
with u considered as the initial state is not stutter-invariant. If we restrict to 
automata in which every state has a stutter-invariant language, is V1 sound? I 
have neither a proof nor a counterexample. (This is certainly not true of VO, as 
Bı is a counterexample.) To explore this question, it would help to find a way to 
encode the stutter-invariant property—or a suitable approximation—in Alloy. 

Finally, the proof of Theorem1 is complicated and might also be flawed. 
Recent work mechanizing such proofs [3] represents an important advance in 
raising the level of assurance in model checking algorithms. It would be inter- 
esting to see if the proof of this theorem is amenable to such methods. How- 
ever, constructing such proofs requires far more effort than the Alloy approach 
described here. One possible approach moving forward is to use tools such as 
Alloy when prototyping a new algorithm, to get feedback quickly and root out 


3 All artifacts needed to reproduce the experiments reported in this paper can be 
downloaded from http://vsl.cis.udel.edu/cav19. 
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bugs. Once Alloy no longer finds any counterexamples, one could then expend 
the considerable effort required to construct a formal mechanized proof. 
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Abstract. Formal verification of real-time systems is attractive because 
these systems often perform critical operations. Unlike non real-time sys- 
tems, latency and response time guarantees are of critical importance in 
this setting, as much as functional correctness. Nevertheless, formal ver- 
ification of real-time OSes usually stops the scheduling analysis at the 
policy level: they only prove that the scheduler (or its abstract model) 
satisfies some scheduling policy. In this paper, we go further and connect 
together Prosa, a verified schedulability analyzer, and RT-CertiKOS, a 
verified single-core sequential real-time OS kernel. Thus, we get a more 
general and extensible schedulability analysis proof for RT-CertiKOS, as 
well a concrete implementation validating Prosa models. It also show- 
cases that it is realistic to connect two completely independent formal 
developments in a proof assistant. 


Keywords: Formal methods - Proof assistant - Real-time scheduling - 
OS kernel - Schedulability analysis 


1 Introduction 


The real-time and OS communities have seen recent effort towards formal proofs, 
through several techniques such as model checking [16,22] and interactive the- 
orem provers [7,14,17]. This trend is motivated by the high stakes of critical 
systems and the combinatorial complexity of considering all possible interleav- 
ings of states of a system, which makes pen-and-paper reasoning too error-prone. 
Real-time OSes used in critical areas such as avionics and automobile applica- 
tions must ensure not only functional correctness but also timing requirements. 
Indeed, a missed deadline may have catastrophic consequences. Schedulability 
analysis aims to guarantee the absence of deadline miss given a scheduling algo- 
rithm which decides which task is going to execute. 
© The Author(s) 2019 
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In the current state of the art, the schedulability analysis is decoupled from 
the kernel code verification. This is good from a separation of concern perspec- 
tive as both kernel verification and schedulability analysis are already complex 
enough without adding in the other. Nevertheless, this gap also means that both 
communities may lack validation from the other one. 

On the one hand, schedulability analysis itself is error-prone, e.g., a flaw was 
found in the original schedulability analysis [26,27,29] for the Controller Area 
Network bus, which is widely used in automobile. To tackle this issue, the Prosa 
library [7] provides mechanized schedulability proofs. This library is developed 
with a focus on readable specifications in order to ensure wide acceptance by 
the community. It is currently a reference for mechanized schedulability proofs 
and was able to verify several existing multicore scheduling policies under a new 
setting with jitter. However, some of its design decisions, in particular for task 
models and scheduling policies, are highly unusual and their adequacy to reality 
has never been justified by connecting them to a concrete OS kernel enforcing a 
real-time scheduling policy. 

On the other hand, OS kernels are very sensitive and bug-prone pieces of code, 
which inspires a lot of existing work on using formal methods to prove functional 
correctness and other requirements, such as access control policies [17], schedul- 
ing policies [31], timing requirements, etc. One such verified OS kernel is RT- 
CertiKOS [21], developed by the Yale FLINT group and built on top of the sequen- 
tial CertiKOS [9,13]. Its verification focuses on extensions beyond pure functional 
correctness, such as real-time guarantees and isolation between components. How- 
ever, any major extension such as real-time adds a lot of proof burden. 

In this paper, we solve both problems at once by combining the formal schedu- 
lability analysis given by Prosa with the functional correctness guarantees of RT- 
CertiKOS. Thus, we get a formal schedulability proof for this kernel: if it accepts a 
task set, then formal proofs ensure that there will be no deadline miss during exe- 
cution. Furthermore, this work also produces a concrete instance of the definitions 
used in Prosa, ensuring their consistency and adequacy with a real system. 


Contributions. In this paper, we make the following contributions: 


— Definition of a clear interface for schedulability analysis between a kernel 
(here, RT-CertiKOS) and a schedulability analyzer (here, Prosa); 

— A workaround for the mismatch between the notion of jobs in schedulability 
analysis (which contains actual execution time) and in OS scheduling through 
the scheduling trace; 

— A way to extend a finite scheduling trace (from RT-CertiKOS) into an infi- 
nite one (for Prosa) while still satisfying the fixed priority preemptive (FPP) 
scheduling policy; 

— A formally proven connection between RT-CertikKOS and Prosa, validating 
Prosa modeling choices and enabling RT-CertiKOS to benefit from the state- 
of-the-art schedulability results of Prosa. 
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Outline of the Paper. Section 2 introduces the Prosa library and its descrip- 
tion of scheduling. In Sect. 3, we describe RT-CertiKOS, its scheduler, as well as 
the associated verification technique, abstraction layers. Section 4 then highlights 
the key differences between the models of Prosa and RT-CertiKOS, and how we 
resolve them. Finally, Sects. 5, 6, and 7, evaluate our work, present future work 
and related work before concluding. 


2 Prosa 


Prosa [7] is a Coq [25] library of models and analyses for real-time systems. 
The library is aimed towards the real-time community and provides models and 
analyses found in the literature with a focus on readable specifications. 


„~ used in 
7-7 instantiates 


>| Concrete schedule 
| \ aes ra i 
7 | 


; Y 
TER — >| Schedule 
Scheduling policy 


Implementation: 


System behavior: 
System model: 


Analysis: 


Fig. 1. An overview of Prosa layers 


The library contains four basic layers, which are presented in Fig. 1: 


System behavior. The base of the library is a model of discrete time traces 
as infinite sequences of events. We consider two such kinds of sequences: 
arrival sequences record requests for service called job activations and sched- 
ules record which job is able to progress. 

System model. In order to reason about system behavior, jobs with simi- 
lar properties are grouped into tasks. Based on system behavior, task mod- 
els (arrival patterns and cost models) and scheduling policies are defined. 
These models are axiomatic in the sense that they are given as predicates on 
traces/schedules and not as generating and scheduling functions. In particu- 
lar, a “FPP scheduler” (see Sect. 2.2) is modeled as “any trace satisfying the 
FPP policy”. 
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Analysis. The library provides response time and schedulability analyses for 
these models. 

Implementation. Finally, examples of traces and schedulers are implemented 
to validate the specifications axiomatized in the System model layer and to 
use the results proven in the Analysis layer. It is this part (more precisely, 
the top left dark block of Fig. 1) that is meant to connect with RT-CertikOS. 


2.1 System Behavior 


The basic definitions in Prosa concern concrete system behavior. The notion of 
time used in the library corresponds to scheduling ticks: durations are given in 
number of ticks and instants are given as number of ticks from initialization of 
the system. For this paper, we focus on single-core systems! on which instances of 
a finite set TaskSet of tasks are scheduled. To each task 7 is associated a relative 
deadline D, which corresponds to the delay we want to guarantee between the 
activation of an instance of a task and its completion. We defer the definition 
of tasks (Definition 4) until their parameters are relevant and focus first on the 
modeling of system behavior in Prosa. The instances of tasks which are to be 
scheduled are called jobs. 


Definition 1 (Job). A job 7 is defined by a task T}, a positive cost c}, and a 
unique identifier. 


We do not use the identifier directly, it is only used to distinguish jobs of the 
same task in traces. 

These jobs are used to describe the workload to be scheduled. This workload 
is defined by an arrival sequence which is a trace of job activations. 


Definition 2 (Arrival sequence). An arrival sequence is a function p map- 
ping any time instant t to a finite (possibly empty) set of jobs p(t). 
A job can only appear once in an arrival sequence. 


Since a job 7 can appear at most once in an arrival sequence p, we can define 
its arrival time a,(j) in p as the instant t such that 9 € p(t). 

We do not model the scheduler as a function, instead we work with schedules 
over an arrival sequence which are traces of scheduled jobs. 


Definition 3 (Schedule). A schedule over an arrival sequence p is a function 
o which maps any time instant t to either a job appearing in p or L. 


The symbol is used for instants at which no job is scheduled. Given an arrival 
sequence p and a schedule o over p, a job 9 € p is said to be scheduled at an 
instant t if o(t) = 9, the service received by 7 up to time t is the number of 
instants before t at which 7 is scheduled. A job 7 is said to be complete at time 
t if its service received up to time t is equal to its cost c} and 7 is said to be 
pending at time t if it has arrived before time ¢ and is not complete at time t. 
From now on, we require schedules to only schedule pending jobs. A job 7 is said 
to be schedulable if it is complete by its absolute deadline d} := ap(3) + D73. 


1 Multicore systems are handled by Prosa but we do not consider them here. 
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2.2 System Model 


Task Model. In order to specify the behavior of the system we are interested 
in, Prosa introduces predicates on traces for which the response time analysis 
provides guarantees. 

We now focus on the definitions related to the sporadic task model and the 
fixed priority preemptive (FPP) scheduling policy. 


Definition 4 (Sporadic FPP task). A sporadic FPP task T is defined by a 
deadline D, € N, a minimal inter-arrival time 67 € N, a worst case execution 
time (WCET) Cr, and a priority p, E N. When D, is equal to ô- , the deadline 
is said implicit. 


Sporadic Task Model. The sporadic task model is specified by a sporadic 
arrival model and a cost model. 

In the sporadic arrival model, consecutive activations of a task 7 are separated 
by a minimum distance 67>: an arrival sequence p is sporadic if for any two 
distinct jobs 71, J2 E€ p of the same task 7, |ap(31) — ap(s2)| > 57. Periodic 
arrivals are a particular case of this model where ôy is the period and jobs 
arrives exactly at intervals of ôy. This is sufficient for us as the schedulability 
analysis for FPP yields the same bounds for sporadic and periodic activations. 

The considered cost model is a constraint on activations: jobs in the arrival 
sequence must respect the WCET of their task, that is, for any 7 € p, c} < C+. 


FPP Scheduling Policy. The FPP policy is modeled in Prosa as two con- 
straints on the schedule: it must be work conserving, that is, it cannot be idle 
when there are pending tasks; and it must respect the priorities, that is, a sched- 
uled job always has the highest priority among pending jobs. 


2.3 Analysis 


Prosa contains a proof of Bertogna and Cirinei’s [4] response time analysis for 
FPP single-core schedules of sporadic tasks, with exact bounds for implicit dead- 
lines. The analysis is based on the following property of the maximum workload 
for these schedules. 


Definition 5 (Maximum Workload). Given a task T € TaskSet and a dura- 
tion A, the maximum workload of the system w.r.t. T within that duration is 


W,(A):= > cnx |2 


7’ €TaskSet T’ 
Pr’ Pr 


The maximum workload W, (A) corresponds to the worst case activation pat- 
tern in which all tasks are simultaneously activated with maximum cost (WCET 
of their task) and minimal inter-arrival distance. It is an upper bound on the 
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amount of service required to schedule activations of the tasks with a priority 
higher than or equal to p, in any interval of size A. Based on this property, we 
can derive a response time bound for our system model if we can find a A larger 
than W, (4). 


Theorem 1 (Response Time Bound). Given a sporadic taskset TaskSet 
and a task T E€ TaskSet then for any R > 0 such that R > W,(R), any job 
3 of task T in an FPP schedule o over an arrival sequence p is completed by 
ap(y) + R. 


For instance, the smallest response time bound for a task rT € TaskSet can 
be computed by the least positive fixed point of the function W+. Using this 
response time bound, we can derive a schedulability criterion by requiring this 
bound to be smaller than or equal to the deadline of task 7. 


2.4 Implementation and Motivation for the Connection 
with RT-CertikOS 


The Prosa library includes functions to generate periodic traces and the corre- 
sponding FPP schedules, together with proofs of these properties and an instan- 
tiation of the schedulability criterion for these traces. This implementation was 
initially provided as a way to check that the modeling of the arrival model and 
scheduling policy are not contradictory and as such the implementation is as 
simple as possible. Although this is a good step in order to make the axiomatic 
definition of scheduling policies more acceptable, there is still room for improve- 
ment: these implementations are still rather ad-hoc and there is no connection 
to an actual system. This is where the link with RT-CertiKOS is beneficial to 
the Prosa ecosystem: it justifies that the model is indeed suitable for a concrete 
and independently developed real-time OS scheduler. 


3 The RT-CertikOS OS Kernel 


RT-CertiKOS [21], developed by the Yale FLINT group, is a real-time exten- 
sion of the single-core sequential CertiKOS [9, 13],? whose functional correctness 
has been mechanized in the Coq proof assistant [25]. The sequential restric- 
tion greatly simplifies the implementation of the OS kernel. However, it does 
not support multicore, and the lack of kernel preemption can also degrade the 
responsiveness of the whole system. RT-CertiKOS proves spatial and temporal 
isolation (including schedulability) between components. 

Both CertiKOS and RT-CertiKOS follow the same proof methodology, orga- 
nized around the notion of abstraction layers that permits decomposition of the 
kernel into small pieces that are easier to verify. 


? There is a multicore version of CertiKOS [14,15], but RT-CertiKOS is developed on 
top of the sequential version. 
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3.1 Abstraction Layers 


Abstraction layers [13] are essentially a way to combine code fragments and their 
interface with simulation proofs. They consist of four elements: (a) a piece of 
code; (b) an underlay, the interface that the code relies on; (c) an overlay, the 
interface that the code provides; (d) a simulation proof ensuring that the code 
running on top of the underlay indeed provides the functionalities described in 
the overlay. 

Implementation details of lower layers are encapsulated in higher layers, allow- 
ing to reason directly with the specifications rather than the implementation. 

Notice that the underlay and overlay are specifications written in Coq and 
may be expressed using the semantics of several programming languages at once. 
This explains how CertiKOS (and RT-CertikKOS) manages to encompass both 
C and assembly code verification into a unified framework. Notice further that 
this notion of interface not only includes functions but also some abstract state, 
which exposes memory states of lower layers in a clean and structured way, and 
allows the overlay to access them only by invoking verified functions. 


3.2 The Scheduler in RT-CertikOS 


RT-CertikOS supports user-level fixed-priority preemptive scheduling. Its sched- 
uler is invoked by timer interrupts periodically, dividing CPU time into intervals, 
which are called time slots, time quanta, or time slices. 


Task Model. Each task in RT-CertiKOS is defined by a fixed priority, a period, 
and a budget (or WCET), the latter two being given in time slot units. Tasks 
are strictly periodic, with implicit hard deadlines, that is, the deadlines are the 
start of the next period and no deadline miss is allowed at all. While this is 
a restricted setting, it is enough to handle closed-loop control, used in control 
real-time systems. Furthermore, RT-CertiKOS only allows for fixed priorities in 
order to get maximum predictability, which is of utmost importance in critical 
systems. Finally, RT-CertiKOS also enforces budgets at the task level: in each 
period, a task cannot be scheduled for more than its specified budget. 


Fixed-Priority Scheduler. The RT-CertiKOS scheduler maintains an integer 
array to keep track of time quantum usage for each task. Upon invocation, the 
scheduler first iterates over all tasks, replenishing quotas whenever a new period 
arrives. It then loops again and finds the highest priority task that has not 
used up its budget, followed by a decrement on the chosen task’s current quota. 
Its abstraction is a Coq function that iterates over an abstract array of task 
control blocks, updates them, and returns the highest task identifier available 
for scheduling. 


Yield System Call. Tasks do not always use up their budgets. A task can yield 
to relinquish any remaining quota, so that lower priority tasks may be scheduled 
earlier and more time slots may be dedicated to non real-time tasks. 
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3.3 Proof Methodology 


Based on sequential CertikKOS, RT-CertiKOS [21] follows the idea of deep spec- 
ifications® in which the specification should be rich enough to deduce any prop- 
erty of interest: there should never be any need to consider the implementation. 
In particular, even though its source code is written in both C and assembly, 
the underlay always abstracts the concrete memory states it operates on into 
abstract states, and abstracts concrete code into Coq functions that act as exe- 
cutable specification. Subsequent layers relying on this underlay will invoke Coq 
functions instead of the concrete code, thus hiding implementation details. 

In the case of scheduling, there are essentially two functions: the scheduler 
and the yield system call. The scheduler relies on two concrete data structures: 
a counter tracking the current time (in time slot units) and an array tracking 
the current quota for each periodic task. The yield system call simply sets the 
remaining quota of the current task to zero. Both functions are verified in RT- 
CertikOS, that is, formals proofs ensure that their C code implementations 
indeed simulate the corresponding Coq specifications. 


3.4 Motivation for the Connection with Prosa 


Upgrading an OS kernel into a real-time one is not an easy task. When one 
further adds formal proofs about functional correctness, isolation, and timing 
requirements, the proof burden becomes enormous. In particular, there is still 
room for future work on RT-CertiKOS, e.g., a WCET analysis of its system 
calls. 

In order to reduce the overall proof burden, it is important to try to del- 
egate as much as possible to specialized libraries and tools. Thus, from the 
RT-CertiKOS perspective, the benefit of using Prosa is precisely to have state- 
of-the-art schedulability analyses already mechanized in Coq, without having to 
prove all these results. 

Furthermore, the schedulability check of Prosa is only performed once while 
verifying the proofs, such that there is no runtime overhead and no loss of per- 
formance for RT-CertiKOS. 


4 From RT-CertikKOS to Prosa: A Schedule Connection 


Prosa definitions cannot apply to RT-CertiKOS directly. Indeed, the perspectives 
of Prosa and RT-CertiKOS on the real-time aspects of a system are not the same, 
which is reflected in the differences in their task models, their executions, and 
the information they need. In this section, we explain how we bridge these gaps 
to actually perform the connection. Table 1 summarizes the various definitions 
and proofs and how they relate to each other. 


3 https: //deepspec.org/. 
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Table 1. Summary of the range of the various data between RT-CertiKOS and Prosa 


RT-CertikOS Simplified Model Interface Prosa 
scheduler 

quota array 

schedule prefix 


wih batchi tasks schedule prefix infinite schedule 
valid schedule prefix valid infinite schedule 
FPP prefix 
FPP 


schedulability analysis 
schedulable execution 


schedulable prefix 


4.1 Interface Between RT-CertiKOS and Prosa 


We design an interface to link RT-CertiKOS and Prosa, focusing on the precise 
amount of information that needs to be transmitted between them. The interface 
is shaped by the information Prosa needs to perform the schedulability analysis: 
a task set and a schedule, together with some properties. 


Key Elements of the Interface. The task model we consider is the one of 
RT-CertiKOS, as it is more restrictive than the ones supported by Prosa. Tasks 
are defined by a priority level p, a period Tp and a WCET (more accurately a 
budget) Cp. Since we only allow one task per priority level, we identify tasks 
and priority levels and we write Cp, Dp, and T, instead of C7, D+, and T,. In 
order for this setting to make sense, we assume the following inequality for each 
task p: 0 < Cp < Tp. Notice that this is a particular case of Prosa’s FPP task 
model (Definition 4). There is no definition of the jobs of a task as they can be 
easily defined from a task and a period number. 

The second element Prosa needs is an infinite schedule. RT-CertiKOS cannot 
provide such an infinite schedule, as only a finite prefix can be known, up to the 
current time. Thus, we keep RT-CertiKOS’s finite schedule as is in the interface 
and it is up to Prosa to extend it into an infinite one, suitable for its analysis. 

Finally, Prosa needs two properties about the schedule: (a) any task receives 
no more service than its WCET in any period; (b) the schedule indeed follows 
the FPP policy. We refer to schedules satisfying these properties as valid schedule 
prefixes. Proving these properties falls to RT-CertikOS. 


Handling Service and Job Cost. In RT-CertikKOS, and more generally in 
any OS, we only assume a bound on the execution time of a task, used as a 
budget. The exact execution time of each of its jobs is not known beforehand 
and can be observed only at runtime. On the opposite, Prosa assumes that costs 
for all jobs of all tasks are part of the problem description and thus are available 
from the start. 
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To fix this mismatch, we define a job cost function computed from a schedule 
prefix: its value is the actual service received if the job has yielded and the WCET 
of its task otherwise. This definition relies on the computation of service in any 
period, which we also provide as part of the interface. 


4.2 The RT-CertikOS Side 


Adding the Schedule in RT-CertikKOS. RT-CertiKOS only maintains the 
current state of the system, which the scheduler relies on, such as the current time 
and quota array. However, the interface requires a schedule trace. We introduce 
such a ghost variable in RT-CertiKOS, and update a few scheduling-related 
primitives to extend this trace whenever a task is scheduled. 

This introduction adds absolutely no proof overhead, since it does not affect 
the scheduling decisions, thus existing proofs about the rest of the system still 
hold. Furthermore, it is a purely logical variable introduced through refinement, 
meaning that it does not exist in the C code, thus it causes no computation 
overhead. 


Too Much Information in RT-CertiKOS. The full RT-CertikKOS model 
contains too much information compared to what the interface requires. 

Firstly, services in RT-CertiKOS may affect a part of the state that is relevant 
to practical scheduling, but is of no interest to the scheduling model we want to 
verify, like batch tasks. 

Secondly, due to the nature of deep specification, the abstraction of the whole 
scheduling operation contains more information than what is required for rea- 
soning about real-time properties. For example, saving and restoring registers is 
essential for the correctness of context switches (thus, of the scheduler), but it 
is irrelevant to temporal properties. 

Thirdly, specifications in RT-CertikKOS enumerate preconditions of the sched- 
uler such as the correct configuration of the paging bit in the control register, the 
validity of the current stack and so on. These are required for other invariants of 
the kernel at other abstraction levels, but again they are irrelevant to scheduling. 


Simplified Model of RT-CertiKOS. For all these reasons, we define a sim- 
plified scheduling model of RT-CertiKOS, with a much simpler abstract state 
containing only the data structures that are actually used in scheduling, from 
which the interface data and its properties must be derived. This simplified 
abstract state contains four fields: 


ticks the current time, that is, the number of past time slots; 
quanta a map giving the remaining quota for each priority; 
cid the identifier of the running process (if it exists); 


schedule the schedule prefix remembering past scheduling decisions. 


This abstract state is not equivalent to the complete one, because it operates 
on a totally different abstract data type where all irrelevant fields are removed. 
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It is also more permissive: more transitions are allowed since it does not perform 
the sanity checks about preconditions such as being in kernel mode, host mode, 
etc. Nevertheless, we still have a simulation: any step in the full RT-CertikKOS 
is also allowed in the simplified version and results in the same scheduling deci- 
sion and trace. This simulation is enough for our purposes as we are ultimately 
interested in the behavior of the full RT-CertiKOS. 


Proving the Properties Required by Prosa. The interface requires two 
key properties: (a) the service received by each job is at most the WCET of its 
task; and (b) the schedule prefix follows FPP. These properties must be proven 
on the RT-CertiKOS side for any schedule that might be generated. This way, 
Prosa can rely on them through the interface. 

Since RT-CertiKOS verification is based on state invariants rather than 
traces, we prove these properties using the following main invariants on the 
simplified scheduling model: 


— the length of the schedule trace is the current time + 1 (the scheduler takes 
a decision for the next time slot); 

— if a task has yielded in the current period, its remaining quota is 0; 

— the service plus the remaining quota is equal to the job cost; 

— the service received in any period is less than the WCET; 

— pending jobs have two equivalent definitions (having positive remaining quota 
or having less service than their job cost); 

— the current schedule follows FPP. 


To prove that these statements are indeed invariants, we must prove that they 
are preserved by any step, that is, by the scheduler (triggered by the user-level 
timer interrupt) and by the yield system call (triggered by the user process), 
since all other kernel steps do not modify the scheduling data of the simplified 
scheduling model. 


Simulation Between the Simplified Scheduling Model and RT- 
CertikKOS. To connect the full RT-CertiKOS model and the simplified one, 
we define a projection function RData_proj extracting the relevant fields from 
the full RT-CertiKOS state to build the simplified one. 

As shown in Fig. 2, we prove that given a scheduler transition of RT-CertikKOS 
between the (full) states d and d’, there is also a transition from their projec- 
tions s and s’ by invoking the simplified scheduler.* If the states d and s satisfy 
respectively the invariants for RT-CertiKOS and the simplified model, then so 
do d’ and s’ (they are invariants). As the states s and s’ are projections of d 
and d’, the invariants of s and s’ also hold on the corresponding fields in d and d’. 
This allows us to utilize the invariants proved in the simplified model to estab- 
lish properties on the full state of RT-CertikKOS. Notice that the schedulability 
property we study is a safety property (deadlines are never missed) and not a 
liveness one (everything is eventually scheduled). 


4 More precisely, we prove that certikos_sched(s) and RData_proj(d’) are eatension- 
ally equal. 
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Fig. 2. Simulation between simplified scheduling and RT-CertikKOS 


4.3 The Prosa Side 


Proven Schedulability Analysis in Prosa. In order to use the response time 
bound of Sect.2, we need to relate any finite schedule prefix from the interface 
to an arrival sequence and a schedule satisfying the model described in Sect. 2. 
We can then rely on any schedulability criterion (e.g., the one described at the 
end of Sect. 2.3) to prove that the response time bound holds and deduce that 
any valid schedule prefix from the interface is indeed schedulable. 


Bridging the Gap Between the Interface and Prosa. The interface pro- 
vides Prosa with a task set, service and job cost functions, and a valid schedule 
prefix. We first build an arrival sequence from the schedule prefix where the n-th 
job (n > 0) for a given task p arrives at time (n — 1) x T, with the cost given 
by the interface. Note that jobs that do not arrive within the prefix cannot have 
yielded yet so that their costs is the WCET of their tasks: we assume the worst 
case for the future. 

The arrival sequence is then defined by adding all jobs of each task p from 
TaskSet, that is, the arrival sequence at time t contains the (|t/T,| + 1)-th job 
of p iff t is divisible by Tp. 

Next, we need to turn the finite schedule prefix into an infinite one. There are 
two possibilities: either build a full schedule from the arrival sequence using the 
Prosa implementation of FPP, or start from the schedule prefix of the interface 
and extend it into an infinite one. The first technique gives for free the fact that the 
infinite schedule satisfies the FPP model from Prosa. The difficulty lies in proving 
that the schedule prefix from the interface is indeed a prefix of this infinite schedule. 
The second technique starts from the schedule prefix and the difficulty is proving 
that it satisfies the FPP model as specified on the Prosa side. 

In this paper, we use the first strategy and prove that the prefix of the 
schedule built by Prosa is equal to the schedule prefix provided in the interface. 
To do so, we use the fact that two FPP schedule prefixes with the same arrival 
sequence and job costs (only known at runtime) are the same, provided we take 
care to properly remember when jobs yield. 
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Assuming that the task set is accepted by the schedulability criterion, we 
know that the Prosa schedule is schedulable and, since this implies that its 
prefix is also schedulable, we deduce that the valid schedule prefix given by the 
interface is schedulable. 


5 Evaluation and Future Work 


5.1 Evaluation 


As the C and assembly source code of RT-CertiKOS was not modified at all, 
this connection does not introduce any overhead to its performance and there is 
no need for a new performance evaluation. Instead, we focus on the benefits this 
works brings and on the amount of work involved, described in Table 2. 


Benefits for RT-CertiKOS and Prosa. The schedulability analysis already 
present in RT-CertiKOS was manually proved and took around 8k LoC to han- 
dle the precise setting described in this paper. By contrast, interfacing with 
Prosa requires 50% less proofs, is more flexible and can easily be extended (see 
Sect. 5.3). The introduction of a simplified scheduling model also reduced by 75% 
the size of proofs of invariants about the high-level abstract scheduler since we 
are freed from the unnecessary information described in Sect. 4.2. 

On the Prosa side, having a complete formal connection with an actual OS 
kernel developed independently validates the modeling choices made for describ- 
ing real-time systems. Indeed, seeing schedulers as predicates over scheduling 
traces is very general but one can legitimately wonder whether such predicates 
accurately describe reality. 


Proof Effort. Designing a good interface allowed us to cleanly separate the 
work required on the RT-CertiKOS and Prosa sides. 

On the RT-CertiKOS side, the design of the simplified scheduling setting was 
pretty straightforward, as was the correctness of the translation. Indeed, this 
translation is essentially a projection, except for batch tasks which are removed. 
Designing adequate inductive invariants to prove the two properties required by 
the interface was the most challenging part of this work and unsurprisingly, it 
took several iterations to find correct definitions. 

On the Prosa side, building the arrival sequence and the infinite schedule is 
quite effortless given a prefix and a job cost function. The subtle thing was to 
find a good definition of the job cost function, which made the corresponding 
proofs significantly easier. Proving that the prefix of the built infinite schedule 
is the same as the interface prefix w.r.t. executions was troublesome for two 
reasons. First, the interface prefix contains an additional boolean representing 
whether the scheduled job yielded and which is used for computing job costs, 
whereas it does not exist in the built schedule. Second, the definition of the FPP 
property in the interface depends on a schedule prefix, while the one in Prosa 
depends on an infinite schedule. 
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Overall, we see the small amount of LoC required to perform this work as a 
validation of the adequacy of our method to the considered problem. 


Table 2. Proof effort 


Feature Changes (LoC) 
Adding a schedule field to RT-CertikKOS 15 
Interface (with proofs) 380 
Simplified scheduling 100 
Proving the invariants about the simplified scheduling | 950 
Translation RT-CertikKOS — simplified scheduling 380 
Conversion between ZArith and SSReflect 280 
Translation interface — Prosa 1900 
Using the schedulability analysis of Prosa 130 
Total 4135 


5.2 Lessons Learned 


Beyond the particular artifact linking RT-CertiKOS with Prosa, what more gen- 
eral lessons can we learn from this connection? 

First, using the same proof assistant greatly helps. Indeed, beyond the 
absence of technical hassle of inter-operability between different formal tools, 
it also avoids the pitfall of a formalization mismatch between both formal mod- 
els and permits sharing common definitions. 

Second, the creation of an explicit interface between both tools clearly marks 
the flow of information, stays focused on the essential information, and delimits 
the “proof responsibility”: which side is responsible for proving which fact. It 
also segregate the proof techniques used on each side so as not to pollute the 
other one, either on a technical aspect (vanilla Coq for RT-CertiKOS vs the 
SSReflect extension for Prosa) or on the verification methods used (invariant- 
based properties for RT-CertiKOS vs trace-based properties for Prosa). This 
separation makes it unnecessary to have people be experts in both tools at once: 
once the interface was clearly defined, experts on each side could work with only 
a rough description of the other one, even though this interface required a few 
later changes. In particular, it is interesting to notice that half the authors are 
experts in RT-CertikOS whereas the other half are experts in Prosa. 

Third, the common part of the models used by both sides must be amenable 
to agreement: in our case, this means having the same notion of time (scheduling 
slots, or ticks) and a compatible notion of schedule (finite and infinite). 

Finally, we expect the interface we designed to be reusable for other verified 
kernels wanting to connect to Prosa or for linking RT-CertiKOS to other formal 
schedulability analysis tools. 
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5.3 Future Work 


Evolving with RT-CertiKOS. The existing implementation of the scheduler 
in RT-CertiKOS imposes a fixed priority scheduling policy with implicit dead- 
lines. In the future, as RT-CertiKOS evolves and supports more task models, 
the interface connecting it with Prosa should also extend. 

A straightforward extension is to allow constrained deadlines, that is, to have 
the deadline Dp be shorter than the period Tp (but greater than the WCET Cp) 
as the schedulability result we use from Prosa already supports it. This requires 
RT-CertikOS to support an extended task model where a task is also specified 
by its deadline. Furthermore, RT-CertiKOS would also need to enforce budget 
at the deadlines, instead of at the beginning of the next period as it is currently 
the case. 

Another extension would be to consider the Earliest Deadline First (EDF) 
scheduling policy which provides better utilization ratio. In addition to relaxing 
the current task model by not including priorities, the main proof effort would 
be to implement and verify this new scheduler in RT-CertikOS. 


Extensions to Prosa. Our experience connecting RT-CertikKOS and Prosa 
shows that Prosa’s assumption of having an infinite schedule is quite impracti- 
cal when verifying instances of real-time systems. This advocates for building 
reusable connections between Prosa’s system model based on infinite traces and 
a model similar to the one used in the interface with RT-CertikOS. Thus, one 
would prove analyses in the convenient setting of infinite traces and still be able 
to apply them to lower level models of real-time systems with finite traces. 


6 Related Work 


Schedulability Analysis. Schedulability analysis as a key theory in the real- 
time community has been widely studied in the past decades. Liu and Layland’s 
seminal work [20] presents a schedulability analysis technique for a simple system 
model described as a set of assumptions. Many later work [3,5,11,23,28] aim 
to capture more realistic? and complex system models by generalizing those 
assumptions. 

In order to provide formal guarantees to those results, several formal 
approaches have been used for the formalism of schedulability analyses, such as 
model checking [8, 12,16], temporal logic [32,33], and theorem proving [10,30]. 

As far as we know, none of the above work has been applied to a formally 
verified OS kernel. 


Verification of Real-Time OS Kernels. There is a lot of work about for- 
mal verification of OS kernels, see [18] for a survey. Therefore, we restrict our 
attention to verification of real-time kernels using proof assistants. We also do 


5 In terms of executions and arrival model. 
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not consider WCET computation, be it of the kernel itself (e.g., [6,24]) or of 
the task set we consider. This is a complementary but clearly distinct task to 
get verified time bounds. 

The eChronos OS [1,2] is a real-time OS running on single-core embedded 
systems. It stops its verification at the scheduling policy level, proving that 
the currently running task always has the highest priority among ready tasks. 
Xu et al. [31] verify the functional correctness of ~C/OS-II [19], a real-time 
operating system with optimizations such as bitmaps. They also prove some 
high level properties, such as priority inversion freedom of shared memory IPC. 

RT-CertiKOS [21] is a verified single-core real-time OS kernel developed by 
the Yale FLINT group, based on sequential CertiKOS [9,13]. It proves both tem- 
poral and spatial isolation among different components, where temporal isolation 
entails schedulability, etc. However, as explained in Sect. 5.1, its schedulability 
proof is longer whereas connecting to an existing schedulability analyzer is easier 
and more flexible. 


7 Conclusion 


Formal verification aims at providing stronger guarantees than testing. Real- 
time systems are a good target because they are often part of critical systems. 
Both the scheduling and OS communities have developed their own formally 
verified tools but there is a lack of integration between them. In this paper, 
we make a first step toward bridging this gap by integrating a formally proven 
schedulability analysis tool, Prosa, with a verified sequential real-time OS kernel, 
RT-CertiKOS. This gives two benefits: first, it provides RT-CertiKOS with a 
modular, extensible, state-of-the-art formal schedulability analysis proof; second, 
it gives a concrete instance of one of the scheduling theories described in Prosa, 
thus ensuring that its model is consistent and applicable to actual systems. 
We believe this connection can be easily adapted for other verified kernels or 
schedulability analyzers. 

It also showcases that it is possible and practical to connect two completely 
independent medium- to large-scale formal proof developments. 
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Abstract. Formal verification of concurrent operating systems (OSs) is 
challenging, and in particular the verification of the dynamic memory 
management due to its complex data structures and allocation algo- 
rithm. Up to our knowledge, this paper presents the first formal specifi- 
cation and mechanized proof of a concurrent buddy memory allocation 
for a real-world OS. We develop a fine-grained formal specification of 
the buddy memory management in Zephyr RTOS. To ease validation of 
the specification and the source code, the provided specification closely 
follows the C code. Then, we use the rely-guarantee technique to con- 
duct the compositional verification of functional correctness and invari- 
ant preservation. During the formal verification, we found three bugs in 
the C code of Zephyr. 


1 Introduction 


The operating system (OS) is a fundamental component of critical systems. 
Thus, correctness and reliability of systems highly depend on the system’s under- 
lying OS. As a key functionality of OSs, the memory management provides ways 
to dynamically allocate portions of memory to programs at their request, and to 
free them for reuse when no longer needed. Since program variables and data are 
stored in the allocated memory, an incorrect specification and implementation 
of the memory management may lead to system crashes or exploitable attacks 
on the whole system. RTOS are frequently deployed on critical systems, mak- 
ing formal verification of RTOS necessary to ensure their reliability. One of the 
state of the art RTOS is Zephyr RTOS [1], a Linux Foundation project. Zephyr 
is an open source RTOS for connected, resource-constrained devices, and built 
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with security and safety design in mind. Zephyr uses a buddy memory allocation 
algorithm optimized for RTOS, and that allows multiple threads to concurrently 
manipulate shared memory pools with fine-grained locking. 

Formal verification of the concurrent memory management in Zephyr is a 
challenging work. (1) To achieve high performance, data structures and algo- 
rithms in Zephyr are laid out in a complex manner. The buddy memory alloca- 
tion can split large blocks into smaller ones, allowing blocks of different sizes to 
be allocated and released efficiently while limiting memory fragmentation con- 
cerns. Seeking performance, Zephyr uses a multi-level structure where each level 
has a bitmap and a linked list of free memory blocks. The levels of bitmaps 
actually form a forest of quad trees of bits. Memory addresses are used as a 
reference to memory blocks, so the algorithm has to deal with address alignment 
and computation concerning the block size at each level, increasing the com- 
plexity of its verification. (2) A complex algorithm and data structures imply as 
well complex invariants that the formal model must preserve. These invariants 
have to guarantee the well-shaped bitmaps and their consistency to free lists. To 
prevent memory leaks and block overlapping, a precise reasoning shall keep track 
of both numerical and shape properties. (3) Thread preemption and fine-grained 
locking make the kernel execution of memory services to be concurrent. 

In this paper, we apply the rely-guarantee reasoning technique to the con- 
current buddy memory management in Zephyr. This work uses 2-Core, a rely- 
guarantee framework for the specification and verification of concurrent reactive 
systems. a-Core introduces a concurrent imperative system specification lan- 
guage driven by “events” that supports reactive semantics of interrupt handlers 
(e.g. kernel services, scheduler) in OSs, and thus makes the formal specification of 
Zephyr simpler. The language embeds Isabelle/HOL data types and functions, 
therefore it is as rich as the own Isabelle/HOL. m-Core concurrent constructs 
allow the specification of Zephyr multi-thread interleaving, fine-grained locking, 
and thread preemption. Compositionality of rely-guarantee makes feasible to 
prove the functional correctness of Zephyr and invariants over its data struc- 
tures. The formal specification and proofs are developed in Isabelle/HOL. They 
are available at https: //lvpgroup.github.io/picore/. 

We first analyze the structural properties of memory pools in Zephyr (Sect. 3). 
The properties clarify the constraints and consistency of quad trees, free block 
lists, memory pool configuration, and waiting threads. All of them are defined as 
invariants for which its preservation under the execution of services is formally 
verified. From the well-shaped properties of quad trees, we can derive a critical 
property to prevent memory leaks, i.e., memory blocks cover the whole memory 
address of the pool, but not overlap each other. 

Together with the formal verification of Zephyr, we aim at the highest evalu- 
ation assurance level (EAL 7) of Common Criteria (CC) [2], which was declared 
this year as the candidate standard for security certification by the Zephyr 
project. Therefore, we develop a fine-grained low level formal specification of 
a buddy memory management (Sect. 4). The specification has a line-to-line cor- 
respondence with the Zephyr C code, and thus is able to do the code-to-spec 
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review required by the EAL 7 evaluation, covering all the data structures and 
imperative statements present in the implementation. 

We enforce the formal verification of functional correctness and invariant 
preservation by using a rely-guarantee proof system (Sect. 5), which supports 
total correctness for loops where fairness does not need to be considered. The 
formal verification revealed three bugs in the C code: an incorrect block split, an 
incorrect return from the kernel services, and non-termination of a loop (Sect. 6). 
Two of them are critical and have been repaired in the latest release of Zephyr. 
The third bug causes nontermination of the allocation service when trying to 
allocate a block of a larger size than the maximum allowed. 


Related Work. (1) Memory models [17] provide the necessary abstraction to 
separate the behaviour of a program from the behaviour of the memory it reads 
and writes. There are many formalizations of memory models in the literature, 
e.g., [10,14,15,19,21], where some of them only create an abstract specification 
of the services for memory allocation and release [10,15,21]. (2) Formal verifi- 
cation of OS memory management has been studied in CertiKOS [11,20], seL4 
[12,13], Verisoft [3], and in the hypervisors from [4,5], where only the works 
in [4,11] consider concurrency. Comparing to buddy memory allocation, the 
data structures and algorithms verified in [11] are relatively simpler, without 
block split/coalescence and multiple levels of free lists and bitmaps. [4] only 
considers virtual mapping but not allocation or deallocation of memory areas. 
(3) Algorithms and implementations of dynamic memory allocation have been 
formally specified and verified in an extensive number of works [7—9, 16, 18, 23]. 
However, the buddy memory allocation is only studied in [9], which does not 
consider concrete data structures (e.g. bitmaps) and concurrency. To the best of 
our knowledge, this paper presents the first formal specification and mechanized 
proof for a concurrent buddy memory allocation of a realistic operating system. 


2 Concurrent Memory Management in Zephyr RTOS 


In Zephyr, a memory pool is a kernel object that allows memory blocks to be 
dynamically allocated, from a designated memory region, and released back into 
the pool. Its definition in the C code is shown as follows. A memory pool’s buffer 
(xbuf) is an n_maz-size array of blocks of max_sz bytes at level 0, with no wasted 
space between them. The size of the buffer is thus n-max x max_sz bytes long. 
Zephyr tries to accomplish a memory request by splitting available blocks into 
smaller ones fitting as best as possible the requested size. Each “level 0” block is a 
quad-block that can be split into four smaller “level 1” blocks of equal size. Like- 
wise, each level 1 block is itself a quad-block that can be split again. At each level, 
the four smaller blocks become buddies or partners to each other. The block size 
at level l is thus max_sz/4!. 
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struct k_mem_block_id { struct k_mem_block { 
u32_t pool : 8; void *data; 
u32_t level : 4; struct k_mem_block_id id; 
u32_t block : 20; 3; 
3; struct k_mem_pool { 
struct k_mem_pool_lvl { void *buf; 
union { size_t max_sz; 
u32_t *bits_p; ui t n_max; 
u3s2_t bits; u8_t n_levels; 
y u8_t max_inline_level; 
sys_dlist_t free_list; struct k_mem_pool_lvl *levels; 
3; _wait_q_t wait_q; 
3; 


The pool is initially configured with the parameters n-max and max_sz, 
together with a third parameter min_sz. min_sz defines the minimum size for 
an allocated block and must be at least 4 x X (X > 0) bytes long. Memory pool 
blocks are recursively split into quarters until blocks of the minimum size are 
obtained, at which point no further split can occur. The depth at which min_sz 
blocks are allocated is n_levels and satisfies that n-max = min_sz x 4”-levels, 

Every memory block is composed of a level; a block index within the level, 
ranging from 0 to (n-max x 4!¢’*') — 1; and the data representing the block 
start address, which is equal to buf + (max_sz/4!°”*!) x block. We use a tuple 
(level, block) to uniquely represent a block within a pool p. 

A memory pool keeps track of how its buffer space has been split using 
a linked list free_list with the start address of the free blocks in each level. 
To improve the performance of coalescing partner blocks, memory pools main- 
tain a bitmap at each level to indicate the allocation status of each block in 
the level. This structure is represented by a C union of an integer bits and an 
array bits_p. The implementation can allocate the bitmaps at levels smaller than 
maz _inlinlelevels using only an integer bits. However, the number of blocks 
in levels higher than maz_inlinle_levels make necessary to allocate the bitmap 
information using the array bits_map. In such a design, the levels of bitmaps 
actually form a forest of complete quad trees. The bit i in the bitmap of level j 
is set to 1 for the block (i, j) iff it is a free block, i.e. it is in the free list at level 
i. Otherwise the bitmap for such block is set to 0. 

Zephyr provides two kernel services k.mem_pool_alloc and k.mem_pool_free, 
for memory allocation and release respectively. The main part of the C code of 
k_mem_pool_alloc is shown in Fig. 1. When an application requests for a memory 
block, Zephyr first computes alloc_l and free_l. alloc_l is the level with the size of 
the smallest block that will satisfy the request, and free_l, with free_l < alloc_l, 
is the lowest level where there are free memory blocks. Since the services are 
concurrent, when the service tries to allocate a free block blk from level free_l 
(Line 8), blocks at that level may be allocated or merged into a bigger block 
by other concurrent threads. In such case the service will back out (Line 9) and 
tell the main function k_mem_pool_alloc to retry. If blk is successfully locked for 
allocation, then it is broken down to level alloc (Lines 11-14). The allocation 
service k.mem_pool_alloc supports a timeout parameter to allow threads waiting 
for that pool for a period of time when the call does not succeed. If the allocation 
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1 static int pool_alloc(struct k_mem_pool «p,struct k_mem_block *block,size_t size) 
2/4 

3] bs aes //calcuate Isizes[], alloc_l and free_1l 

4 if (alloc_l x 0 || free_l < 0) { 

5 block->data = NULL; 

6 return —ENOMEM; 

4 } 

8 blk = alloc_block (p, free_l, lsizes[free_l]); 

9 if (!blk) { return -EAGAIN; } 

10 /* Iteratively break the smallest enclosing block... */ 

11 for (from_l = free_l; level_empty(p, alloc_l) && from_l < alloc_l; 

12 from_l+t+) { 

13 blk = break_block(p, blk, from_l, lsizes); 

14 } 

15 block->id.level = alloc_l; //assign block level to the variable *block 
16. | gers //assign other block info to the variable *block 

17 return 0; 

18 |} 

19 


20 | int k_mem_pool_alloc (struct k_mem_pool *p, struct k_mem_block x*block, size_t size, 
s32_t timeout) 


21 | { 

DE ee eee // initialize local vars, calculate the end time for timeout. 
23 while (1) { 

24 ret = pool_alloc(p, block, size); 

25 if (ret == 0 || timeout == K_NO_WAIT || 

26 ret == -EAGAIN || (ret && ret != —ENOMEM)) { 

27 return ret; 

28 } 

29 key = irq_lock(); 

30 _pend_current_thread(&p->wait_q, timeout); 

31 _ Swap (key); 

324). Higher //if timeout > 0, break the loop if time out 
33 } 

34 return -EAGAIN; 

35 |} 


Fig. 1. The C source code of memory allocation in Zephyr v1.8.0 


fails (Line 24) and the timeout is not K_NO_WAIT, the thread is suspended 
(Line 30) in a linked list watt_q and the context is switched to another thread 
(Line 31). 

Interruptions are always enabled in both services with the exception of 
the code for the functions alloc_block and break_block, which invoke irq_lock 
and irq_unlock to respectively enable and disable interruptions. Similar to 
k_mem_pool_alloc, the execution of k.mem_pool_free is interruptable too. 


3 Defining Structures and Properties of Buddy Memory 
Pools 


As a specification at design level, we use abstract data types to represent the 
complete structure of memory pools. We use an abstract reference ref in Isabelle 
to define pointers to memory pools. Starting addresses of memory blocks, mem- 
ory pools, and unsigned integers in the implementation are defined as natural 
numbers (nat). Linked lists used in the implementation for the elements levels 
and free_list, together with the bitmaps used in bits and bits_p, are defined as 
a list type. C structs are modelled in Isabelle as records of the same name as 
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the implementation and comprising the same data. There are two exceptions to 
this: (1) kamem_block_id and kmem_block are merged in one single record, (2) 
the union in the struct k_mem_pool_lul is replaced by a single list representing 
the bitmap, and thus maz_inline_level is removed. 


Lo 1 ee 


buf 
i buf+max_sz 
*levels fi ee_l ist max_sz bytes 
0a n max- | 
2 > memory 
3 0 1 = 3 block Legend 


DIVIDED 


- 0 7 ALLOCATED 
FREE 
0 | 39 ALLOCATING 
L 
n_levels - 1 it e FREEING 
0 159 |160 || 161 175 NOEXIST 
0 1 >|, E E E 700 ... 703 


Fig. 2. Structure of memory pools 


The Zephyr implementation makes use of a bitmap to represent the state of 
a memory block. The bit j of the bitmap for level a 7 is set to 1 iff the memory 
address of the memory block (i, j) is in the free list at level i. A bit j at a level 
i is set to 0 under the following conditions: (1) its corresponding memory block 
is allocated (ALLOCATED), (2) the memory block has been split (DIVIDED), 
(3) the memory block is being split in the allocation service (ALLOCATING) 
(Line 13 in Fig. 1), (4) the memory block is being coalesced in the release service 
(FREEING), and (5) the memory block does not exist (VOEXIST). Instead of 
only using a binary representation, our formal specification models the bitmap 
using a datatype BlockState that is composed of these cases together with FREE. 
The reason of this decision is to simplify proving that the bitmap shape is well- 
formed. In particular, this representation makes less complex to verify the case 
in which the descendant of a free block is a non-free block. This is the case where 
the last free block has not been split and therefore lower levels do not exist. We 
illustrate a structure of a memory pool in Fig. 2. The top of the figure shows the 
real memory of the first block at level 0. 

The structural properties clarify the constraints on and consistency of quad 
trees, free block lists, the memory pool configuration, and waiting threads. All 
of them are thought of as invariants on the kernel state and have been formally 
verified on the formal specification in Isabelle/HOL. 
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Well-Shaped Bitmaps. We say that the logical memory block j at a level i 
physically exists iff the bitmap j for the level i is ALLOCATED, FREE, ALLO- 
CATING, or FREEING, represented by the predicate is:memblock. We do not 
consider blocks marked as DIVIDED as physical blocks since it is only a logical 
block containing other blocks. Threads may split and coalesce memory blocks. 
A valid forest is defined by the following rules: (1) the parent bit of an existing 
memory block is DIVIDED and its child bits are NOEXIST, denoted by the 
predicate noexist_bits that checks for a given bitmap b and a position j that 
nodes b!j to bI(j + 3) are set as NOEXIST; (2) the parent bit of a DIVIDED 
block is also DIVIDED; and (3) the child bits of a NOEXIST bit are also NOEX- 
IST and its parent can not be a DIVIDED block. The property is defined as the 
predicate inv-bitmap(s), where s is the state. 

There are two additional properties on bitmaps. First, the address space of 
any memory pool cannot be empty, i.e., the bits at level 0 have to be different 
to NOEXIST. Second, the allocation algorithm may split a memory block into 
smaller ones, but not the those blocks at the lowest level (i.e. level n_levels — 1), 
therefore the bits at the lowest level cannot not be DIVIDED. The first property 
is defined as inv-bitmapO(s) and the second as inv-bitmapn(s). 


Consistency of the Memory Configuration. The configuration of a memory 
pool is set when it is initialized. Since the minimum block size is aligned to 4 
bytes, there must exists an n > 0 such that the maximum size of a pool is 
equal to 4 x n x 4"evels relating the number of levels of a level 0 block with 
its maximum size. Moreover, the number of blocks at level 0 and the number 
of levels have to be greater than zero, since the memory pool cannot be empty. 
The number of levels is equal to the length of the pool levels list. Finally, the 
length of the bitmap at level i should be n_maz x 4’. This property is defined 
as inv-mempool-info(s). 


Memory Partition Property. Memory blocks partition the pool they belong 
to, and then not overlapping blocks and the absence of memory leaks are critical 
properties. For a memory block of index j at level į, its address space is the inter- 
val [jx (max_sz/4*), (j+1)x(max_sz/4')). For any relative memory address addr 
in the memory domain of a memory pool, and hence addr < n_mazx * maz_sz, 
there is one and only one memory block whose address space contains addr. 
Here, we use relative address for addr. The property is defined as mem-part(s). 

From the invariants of the bitmap, we derive the general property for the 
memory partition. 


Theorem 1 (Memory Partition). For any kernel state s, If the memory pools 
in s are consistent in their configuration, and their bitmaps are well-shaped, the 
memory pools satisfy the partition property in s: 


inv_mempool_info(s) ^ inv_bitmap(s) A inv_bitmapO0(s) ^ inv_bitmapn(s) => mem_part(s) 
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Together with the memory partition property, pools must also satisfy the 
following: 


No Partner Fragmentation. The memory release algorithm in Zephyr coa- 
lesces free partner memory blocks into blocks as large as possible for all the 
descendants from the root level, without including it. Thus, a memory pool does 
not contain four FREE partner bits. 


Validity of Free Block Lists. The free list at one level keeps the start- 
ing address of free memory blocks. The memory management ensures that the 
addresses in the list are valid, i.e., they are different from each other and aligned 
to the block size, which at a level i is given by (max_sz/4'). Moreover, a memory 
block is in the free list iff the corresponding bit of the bitmap is FREE. 


Non-overlapping of Memory Pools. The memory spaces of the set of pools 
defined in a system must be disjoint, so the memory addresses of a pool does 
not belong to the memory space of any other pool. 


Other Properties. The state of a suspended thread in wait_q has to be consis- 
tent with the threads waiting for a memory pool. Threads can only be blocked 
once, and those threads waiting for available memory blocks have to be in a 
BLOCKED state. During allocation and free of a memory block, blocks of the 
tree may temporally be manipulated during the coalesce and division process. 
A block can be only manipulated by a thread at a time, and the state bit of a 
block being temporally manipulate has to be FREEING or ALLOCATING. 


4 Formalizing Zephyr Memory Management 


For the purpose of formal verification of event-driven systems such as OSs, we 
have developed 7-Core, a framework for rely-guarantee reasoning of components 
running in parallel invoking events. 7-Core has support for concurrent OSs fea- 
tures like modelling shared-variable concurrency of multiple threads, interrupt- 
able execution of handlers, self-suspending threads, and rescheduling. In this 
section, we first introduce the modelling language in 7-Core and an execution 
model of Zephyr using this language. Then we discuss in detail the low-level 
design specification for the kernel services that the memory management pro- 
vides. Since this work focuses on the memory management, we only provide very 
abstract models for other kernel functionalities such as the kernel scheduling and 
thread control. 


4.1 Event-Based Execution Model of Zephyr 


The Language in x-Core. Interrupt handlers in 7-Core are considered as 
reaction services which are represented as events: 


EVENT £ [p1,...,Pn]@k WHEN g THEN P END 
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In this representation, an event is a parametrized imperative program P with 
a name €, a list of service input parameters pı, ..., Pn, and a guard condition 
g to determine the conditions triggering the event. In addition to the input 
parameters, an event has a special parameter k which indicates the execution 
context, e.g. the scheduler and the thread invoking the event. The imperative 
commands of an event body P in m-Core are standard sequential constructs 
such as conditional execution, loop, and sequential composition of programs. It 
also includes a synchronization construct for concurrent processes represented 
by AWAIT b THEN P END. The body P is executed atomically if and only 
if the boolean condition b holds, not progressing otherwise. ATOM P END 
denotes an Await statement for which its guard is True. 

Threads and kernel processes have their own execution context and local 
states. Each of them is modelled in 7-Core as a set of events called event systems 
and denoted as ESYS S = {&, ..., En}. The operational semantics of an event 
system is the sequential composition of the execution of the events composing 
it. It consists in the continuous evaluation of the guards of the system events. 
From the set of events for which the associated guard g holds in the current 
state, one event € is non-deterministically selected to be triggered, and its body 
P executed. After P finishes, the evaluation of the guards starts again looking 
for the next event to be executed. Finally, 7-Core has a construct for parallel 
composition of event systems esyso || ... || esysn which interleaves the execution 
of the events composing each event system esys; for 0 <i<n. 


Prat a esySt, = (Ub. {free (b) @t,}) U 
` : (Uq,sz,to): {alloc( p, sz, to)@t,}) 


@ ail syscall(free) 
t J M : k mempool free 


user modë i 


kernel mode ' i eV = (Up. (free(b)@t2) U 
: A TE (Up,sz,t0)- {alloc(p, sz, to) @t}) 
i m raise 
mv i 
fall i i 
S ap z 4 i = 2 — £ eSYSschea = (Ut. {sched (t)@S}) 


| sched 


es 


Fig. 3. An execution model of Zephyr memory management 


Execution Model of Zephyr. If we do not consider its initialization, an OS 
kernel can be consider as a reactive system that is in an idle loop until it receives 
an interruption which is handled by an interruption handler. Whilst interrupt 
handlers execution is atomic in sequential kernels, it can be interrupted in con- 
current kernels [6,22] allowing services invoked by threads to be interrupted 
and resumed later. In the execution model of Zephyr, we consider a scheduler 
S and a set of threads tı, ...,tn. In this model, the execution of the scheduler 
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is atomic since kernel services can not interrupt it. But kernel services can be 
interrupted via the scheduler, i.e., the execution of a memory service invoked by 
a thread t; may be interrupted by the kernel scheduler to execute a thread tj. 
Figure 3 illustrates Zephyr execution model, where solid lines represent execution 
steps of the threads/kernel services and dotted lines mean the suspension of the 
thread/code. For instance, the execution of k-mempool_free in thread tı is inter- 
rupted by the scheduler, and the context is switched to thread tz which invokes 
k_mempool_alloc. During the execution of t2, the kernel service may suspend the 
thread and switch to another thread t, by calling rescheduling. Later, the exe- 
cution is switched back to tı and continues the execution of k:mempool_free in 
a different state from when it was interrupted. 

The event systems of Zephyr are illustrated in the right part of Fig.3. A 
user thread t; invoke allocation/release services, thus the event system for t; is 
esys;,, a set composed of the events alloc and free. The input parameters for these 
events correspond with the arguments of the service implementation, that are 
constrained by the guard for each service. Together with system users we model 
the event service for the scheduler esyssceneq consisting on a unique event sched 
whose argument is a thread t to be scheduled when tis in the READY state. The 
formal specification of the memory management is the parallel composition of the 
event system for the threads and the scheduler esys;, || ... || esysz,, || €5YSsched 


Thread Context and Preemption. Events are parametrized by a thread iden- 
tifier used to access to the execution context of the thread invoking it. As shown 
in Fig.3, the execution of an event executed by a thread can be stopped by 
the scheduler to be resumed later. This behaviour is modelled using a global 
variable cur that indicates the thread being currently has been scheduled and 
is being executed, and conditioning the execution of parametrized events in t 
only when ¢ is scheduled. This is achieved by using the expression t > p = 
AWAIT cur = t THEN p END, so an event invoked by a thread t only pro- 
gresses when t is scheduled. This scheme allows to use rely-guarantee for concur- 
rent execution of threads on mono-core architectures, where only the scheduled 
thread is able to modify the memory. 


4.2 Formal Specification of Memory Management Services 


This section discusses the formal specification of the memory management ser- 
vices. These services deal with the initialization of pools, and memory allocation 
and release. 


System State. The system state includes the memory model introduced in 
Sect. 4, together with the thread under execution in variable cur and local vari- 
ables to the memory services used to keep temporal changes to the structure, 
guards in conditional and loop statements, and index accesses. The memory 
model is represented as a set mem_pools storing the references of all memory 
pools and a mapping mem_pool_info to query a pool by a pool reference. Local 
variables are modelled as total functions from threads to variable values, repre- 
senting that the event is accessing the thread context. In the formal model of 
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the events we represent access to a state component c using ‘c and the value 
of a local component c for the thread t is represented as ‘c t. Local variables 
allocating_node and freeing_node are relevant for the memory services, storing 
the temporal blocks being split/coalesced in alloc/release services respectively. 


Memory Pool Initialization. Zephyr defines and initializes memory pools 
at compile time by constructing a static variable of type struct k:mem_pool. 
The implementation initializes each pool with n_maz level 0 blocks with size 
maz_sz bytes. Bitmaps of level 0 are set to 1 and free list contains all level 0 
blocks. Bitmaps and free lists of other level are initialized to 0 and to the empty 
list respectively. In the formal model, we specify a state corresponding to the 
implementation initial state and we show that it belongs to the set of states 
satisfying the invariant. 


WHILE ‘free-block-r t DO 
te ‘Isz :=‘Isz (t := ‘Isizes t ! (1v1 t));; 
t» ‘blk := ‘blk (t := block-ptr (‘mem-pool-info (pool b)) (‘Isz t) Cbn t));; 
t> ATOM 
‘mem-pool-info := set-bit-free ‘mem-pool-info (pool b) (1v1 t) Cbn t);; 
‘freeing-node := ‘freeing-node (t := None);; 
IF ‘lvl t > 0 ^ partner-bits (‘mem-pool-info (pool b)) (1v1 t) Cbn t) THEN 
FOR 7 := i(t := 0); it < 4; 1:=‘i(t:= it+1)DO 
‘bb := ‘bb (t := (‘bn t div 4) » 4 + ʻi t);; 
‘mem-pool-info := set-bit-noexist ‘mem-pool-info (pool b) (1v1 t) (‘bb t);; 
‘block-pt := ‘block-pt (t := block-ptr (‘mem-pool-info (pool b)) (‘Isz t) (“bb t));; 
IF ‘bn t 4 ‘bb t ^ block-fits (‘mem-pool-info (pool b)) (‘block-pt t) (‘Isz t) THEN 
‘mem-pool-info := ‘mem-pool-info ((pool b) := 
remove-free-list (‘mem-pool-info (pool b)) (1v1 t) (‘block-pt t)) 
FI 
ROF;; 
‘Ivl := ‘Ivl (t := lvl t — 1);; 
‘bn := ‘bn (t := “bn t div 4);; 
‘mem-pool-info := set-bit-freeing ‘mem-pool-info (pool b) (1v1 t) Cbn t);; 
‘freeing-node := ‘freeing-node (t := Some (pool = (pool b), level = (Ivl t), 


Oe ee ee 
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21 block = (‘bn t), data = block-ptr (‘mem-pool-info (pool b)) 
22 (((ALIGN4 (max-sz (‘mem-pool-info (pool b)))) div (4 ^ (1v1 t)))) Cbn t) )) 
23 ELSE 

24 IF block-fits (‘mem-pool-info (pool b)) (blk t) (‘Isz t) THEN 

25 ‘mem-pool-info := ‘mem-pool-info ((pool b) := 

26 append-free-list (‘mem-pool-info (pool b)) (1v1 t) (‘blk t) ) 
27 FI;; 

28 ‘free-block-r := ‘free-block-r (t := False) 

29 FI 

30 END 

31 oD 


Fig. 4. The z-Core specification of free_block 
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Memory Allocation/Release Services. The C code of Zephyr uses the recur- 
sive function free_block to coalesce free partner blocks and the break statement to 
stop the execution of a loop statements, which are not supported by the impera- 
tive language in 7-Core. The formal specification overcomes this by transforming 
the recursion into a loop controlled by the recursion condition, and using a con- 
trol variable to exit loops with breaks when the condition to execute the loop 
break is satisfied. Additionally, the memory management services use the atomic 
body irg_lock(); P; irq_unlock(); to keep interruption handlers reentrant by dis- 
abling interruptions. We simplify this behaviour in the specification using an 
ATOM statement, avoiding that the service is interrupted at that point. The 
rest of the formal specification closely follows the implementation, where vari- 
ables are modified using higher order functions changing the state as the code 
does it. The reason of using Isabelle/HOL functions is that m-Core does not 
provide a semantic for expressions, using instead state transformer relying on 
high order functions to change the state. 

Figure 4 illustrates the 7-Core specification of the free_block function invoked 
by k_mem_pool_free when releasing a memory block. The code accesses the fol- 
lowing variables: lsz, lsize, and lul to keep information about the current level; 
blk, bn, and bb to represent the address and number of the block currently being 
accessed; freeing_node to represent the node being freeing; and 7 to iterate 
blocks. Additionally, the model includes the component free_block_r to model 
the recursion condition. To simplify the representation the model uses predicates 
and functions to access and modify the state. Due to space constrains, we are 
unable to provide detailed explanation of these functions. However the name of 
the functions can help the reader to better understand their functionality. We 
refer readers to the Isabelle/HOL sources for the complete specification of the 
formal model. 

In the C code, free_block is a recursive function with two conditions: (1) the 
block being released belongs to a level higher than zero, since blocks at level 
zero cannot be merged; and (2) the partners bits of the block being released are 
FREE so they can be merged into a bigger block. We represent (1) with the 
predicate lvl t > 0 and (2) with the predicate partner_bit_free. The formal 
specification follows the same structure translating the recursive function into a 
loop that is controlled by a variable mimicking the recursion. 

The formal specification for free_block first releases an allocated memory 
block bn setting it to FREEING. Then, the loop statement sets free_block to 
FREE (Line 5), and also checks that the iteration/recursive condition holds in 
Line 7. If the condition holds, the partner bits are set to NOEXIST, and remove 
their addresses from the free list for this level (Lines 12-14). Then, it sets the 
parent block bit to FREEING (Lines 17-22), and updates the variables control- 
ling the current block and level numbers, before going back to the beginning 
of the loop again. If the iteration condition is not true it sets the bit to FREE 
and add the block to the free list (Lines 24-28) and sets the loop condition to 
false to end the procedure. This function is illustrated in Fig. 2. The block 172 
is released by a thread and since its partner blocks (block 173-175) are free, 
Zephyr coalesces the four blocks and sets their parent block 43 as FREEING. 
The coalescence continues iteratively if the partners of block 48 are all free. 
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5 Correctness and Rely-Guarantee Proof 


We have proven correctness of the buddy memory management in Zephyr using the 
rely-guarantee proof system of m-Core. We ensure functional correctness of each 
kernel service w.r.t. the defined pre/post conditions, invariant preservation, ter- 
mination of loop statements in the kernel services, the preservation of the memory 
configuration during small steps of kernel services, and the separation of local vari- 
ables of threads. In this section, we introduce the rely-guarantee proof system of 
m-Core and how these properties are specified and verified using it. 


5.1 Rely-Guarantee Proof Rules and Verification 


A rely-guarantee specification for a system is a quadruple RGCond = 
(pre, R, G, pst), where pre is the pre-condition, R is the rely condition, G is 
the guarantee condition, and pst is the post-condition. The intuitive meaning 
of a valid rely-guarantee specification for a parallel component P, denoted by 
= P sat (pre, R,G, pst), is that if P is executed from an initial state s € pre 
and any environment transition belongs to the rely relation R, then the state 
transitions carried out by P belong to the guarantee relation G and the final 
states belong to pst. 

We have defined a rely-guarantee axiomatic proof system for the 7-Core spec- 
ification language to prove validity of rely-guarantee specifications, and proven 
in Isabelle/HOL its soundness with regards to the definition of validity. Some of 
the rules composing the axiomatic reasoning system are shown in Fig. 5. 


[AWAIT] [BASICEVT] 
+ P sat (pren bN {V}, Id, UNIV, {s | (V,s) E G} N pst) F body(a) sat (pre N guard(a), R, G, pst) 
stable(pre,R)  stable(pst, R) stable(pre,R) Vs. (s,s) € G 

+ (Await b P) sat (pre, R, G, pst) + Event a sat (pre, R, G, pst) 
[WHILE] [PAR] 


F- P sat (loopinv N b, R, G, loopinv})  (1)V«. H PS(K) sat (pres,, Rsk, GSr, psts,) 

loopinv N —b C pst Vs. (s,s) € G (2)V«. pre C pres, (3)VK. psts, C pst (4)VK. Gsk CG 
stable(loopinv, R)  stable(pst, R) (5)VK. RC Rsk (6)VK, K’. R AK! — Gsy C Rs, 

+ (While b P) sat (loopinv, R, G, pst) - PS sat (pre, R, G, pst) 


Fig. 5. Typical rely-guarantee proof rules in 7-Core 


A predicate P is stable w.r.t. a relation R, represented as stable(P, R), when 
for any pair of states (s,t) such that s € P and (s,t) € R then t € P. The 
intuitive meaning is that an environment represented by R does not affect the 
satisfiability of P. The parallel rule in Fig. 5 establishes compositionality of the 
proof system, where verification of the parallel specification can be reduced to 
the verification of individual event systems first and then to the verification 
of individual events. It is necessary that each event system PS(«) satisfies its 
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specification (pres,, Rs, Gs, psts,) (Premise 1); the pre-condition for the par- 
allel composition implies all the event system’s pre-conditions (Premise 2); the 
overall post-condition must be a logical consequence of all post-conditions of 
event systems (Premise 3); since an action transition of the concurrent system 
is performed by one of its event system, the guarantee condition Gs, of each 
event system must be a subset of the overall guarantee condition G (Premise 4); 
an environment transition Rs, for the event system « corresponds to a transi- 
tion from the overall environment R (Premise 5); and an action transition of an 
event system « should be defined in the rely condition of another event system 
K’, where K Æ K’ (Premise 6). 

To prove loop termination, loop invariants are parametrized with a logical 
variable a. It suffices to show total correctness of a loop statement by the fol- 
lowing proposition where loopinvu(a) is the parametrize invariant, in which the 
logical variable is used to find a convergent relation to show that the number of 
iterations of the loop is finite. 


- P sat (loopinu(a)N{ a >0}, R,G,38 < a. loopiny(B)) A loopinv(a)N{@a>O}C {db} 
A loopinv(0) C { =b } A Ys € loopinv(a). (s,t) E€ R — AB <a. t € loopinv(B) 


5.2 Correctness Specification 


Using the compositional reasoning of a-Core, correctness of Zephyr memory 
management can be specified and verified with the rely-guarantee specification 
of each event. The functional correctness of a kernel service is specified by its 
pre/post-conditions. Invariant preservation, memory configuration, and separa- 
tion of local variables is specified in the guarantee condition of each service. 
The guarantee condition for both memory services is defined as: 
Mem-pool-alloc-guar t= Id U(gvars_conf_stableN 
(3.1) 


ae aevrnh— 
{(s,r). (cur s # Some t — gvars-nochange s r ^ lvars-nochange tsr ) 


(3.2) (4) 
reer eee ea a rn eo 
A (cur s = Some t — inv s — invr) A (Vt'. t' At — lvars-nochange t’ sr ) }) 


This relation states that alloc and free services may not change the state 
(1), e.g., a blocked await or selecting branch on a conditional statement. If it 
changes the state then: (2) the static configuration of memory pools in the 
model do not change; (3.1) if the scheduled thread is not the thread invoking 
the event then variables for that thread do not change (since it is blocked in 
an Await as explained in Sect.3); (3.2) if it is, then the relation preserves the 
memory invariant, and consequently each step of the event needs to preserve the 
invariant; (4) a thread does not change the local variables of other threads. 

Using the z-Core proof rules we verify that the invariant introduced in Sect. 4 
is preserved by all the events. Additionally, we prove that when starting in a valid 
memory configuration given by the invariant, then if the service does not returns 
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an error code then it returns a valid memory block with size bigger or equal than 
the requested capacity. The property is specified by the following postcondition: 


Mem-pool-alloc-pre t= {s. invs A^ allocating-nodest = None ^ freeing-nodest = None} 
Mem-pool-alloc-post t p sz timeout = 
{s. inv s A allocating-node s t = None ^ freeing-node s t = None 
^ (timeout = FOREVER — 
(ret s t = ESIZEERR ^ mempoolalloc-ret s t = None V 
ret s t = OK A (Amblk. mempoolalloc-ret s t = Some mblk ^ mblk-valid s p sz 
mblk))) 
A (timeout = NOWAIT — 
((ret s t = ENOMEM V ret s t = ESIZEERR) ^ mempoolalloc-ret s t = None) V 
(ret s t = OK A (A mblk. mempoolalloc-ret s t = Some mblk ^ mblk-valid s p sz 
mblk))) 
A (timeout > 0 — 
((ret s t = ETIMEOUT V ret s t = ESIZEERR) ^ mempoolalloc-ret s t = None) V 
(ret s t = OK A (A mblk. mempoolalloc-ret s t = Some mblk 
A^ mblk-valid s p sz mblk)))} 


If a thread requests a memory block in mode FOREVER, it may successfully 

allocate a valid memory block, or fail (ESIZEERR) if the request size is larger 
than the size of the memory pool. If the thread is requesting a memory pool in 
mode NOWAIT, it may also get the result of ENOMEM if there is no available 
blocks. But if the thread is requesting in mode TIMEOUT, it will get the result 
of ETIMEOUT if there is no available blocks in timeout milliseconds. 
The property is indeed weak since even if the memory has a block able to 
allocate the requested size before invoking the allocation service, another thread 
running concurrently may have taken the block first during the execution of 
the service. For the same reason, the released block may be taken by another 
concurrent thread before the end of the release services. 


5.3 Correctness Proof 


In the m-Core system, verification of a rely-guarantee specification proving a 
property is carried out by inductively applying the proof rules for each sys- 
tem event and discharging the proof obligations the rules generate. Typically, 
these proof obligations require to prove stability of the pre- and post-condition 
to check that changes of the environment preserve them, and to show that a 
statement modifying a state from the precondition gets a state belonging to the 
postcondition. 

To prove termination of the loop statement in free_block shown in Fig. 4, we 
define the loop invariant with the logical variable a as follows. 


mp-free-loopinv t ba = {... A ‘inv A level b < length (‘Isizes t) 
A (V ii<Jlength (‘Isizes t). ‘Isizes t ! ii = (max-sz (’ mem-pool-info (pool b))) div (4 ^ ii)) 
A ‘bn t < length (bits (levels (“mem-pool-info (pool b))!(‘Iv1 t))) 
A ‘bn t = (block b) div (4 ^ (level b — ‘1v1 t)) A “lvlt < level b 
A (‘free-block-r t — (A blk. ’ freeing-node t = Some blk ^ pool blk = pool b 
A level blk = ' lvl t A block blk = ‘bn t) 
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A ‘alloc-memblk-data-valid (pool b) (the ( ’ freeing-node t))) 
A (= ‘ free-block-r t — ’ freeing-node t = None) } A 
{ a= (if ‘freeing-node t # None then ‘Ivl t + 1 else 0) } 


freeing_node and lvt are local variables respectively storing the node being 
free and the level that the node belongs to. In the body of the loop, if lvl t > 0 
and partner_bit is true, then lvl = lul — 1 at the end of the body. Otherwise, 
freeing_node t = None. So at the end of the loop body, a decreases or a = 0. 
If a = 0, we have freeing_node t = None, and thus the negation of the loop 
condition =free_block_r t, concluding termination of free_block. 

Due to concurrency, it is necessary to consider fairness to prove termination 
of the loop statement in k_mempool_alloc from Line 23 to 33 in Fig.1. On the 
one hand, when a thread requests a memory block in the FOREVER mode, it 
is possible that there will never be available blocks since other threads do not 
release allocated blocks. On the other hand, even when other threads release 
blocks, it is possible that the available blocks are always raced by threads. 


6 Evaluation and Results 


Evaluation. The verification conducted in this work is on Zhephyr v1.8.0, 
released in 2017. The C code of the buddy memory management is ~40Olines, 
not counting blank lines and comments. Table 1 shows the statistics for the effort 
and size of the proofs in the Isabelle/HOL theorem prover. In total, the models 
and mechanized verification consists of 28,000 lines of specification and proofs, 
and the total effort is +12 person-months. The specification and proof of 7-Core 
are reusable for the verification of other systems. 


Table 1. Specification and proof statistics 


m-Core language Memory management 

Item LOS/LOP | Item LOS/LOP 
Language and proof rules 700 Specification 400 
Lemmas of language/semantics | 3000 Auxiliary lemmas/invariant | 1700 
Soundness 7100 Proof of allocation 10600 
Invariant 100 Proof of free 4950 
Total 10,900 Total 17,650 


Bugs in Zephyr. During the formal verification, we found 3 bugs in the C code 
of Zephyr. The first two bugs are critical and have been repaired in the latest 
release of Zephyr. To avoid the third one, callers to k:mem_pool_alloc have to 
constrain the argument t_size size. 
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(1) Incorrect block split: this bug is located in the loop in Line 11 of the 
k_mem_pool_alloc service, shown in Fig. 1. The level_empty function checks if a 
pool p has blocks in the free list at level alloc_l. Concurrent threads may release 
a memory block at that level making the call to level_empty(p, alloc_l) to return 
false and stopping the loop. In such case, it allocates a memory block of a bigger 
capacity at a level 7 but it still sets the level number of the block as alloc_l at 
Line 15. The service allocates a larger block to the requesting thread causing an 
internal fragmentation of mar_sz/4* — mazx_sz/4%!°* bytes. When this block 
is released, it will be inserted into the free list at level alloc_l, but not at level 
i, causing an external fragmentation of mar_sz/4' — max_sz/4%'°*, The bug is 
fixed by removing the condition level_empty(p, alloc_l) in our specification. 

(2) Incorrect return from k_-mem_pool_alloc: this bug is found at Line 
26 in Fig.1. When a suitable free block is allocated by another thread, the 
pool_alloc function returns EAGAIN at Line 9 to ask the thread to retry the 
allocation. When a thread invokes k.mem_pool_alloc in FOREVER mode and 
this case happens, the service returns EAGAIN immediately. However, a thread 
invoking k_mem_pool_alloc in FOREVER mode should keep retrying when it does 
not succeed. We repair the bug by removing the condition ret == EAGAIN 
at Line 26. As explained in the comments of the C Code, EAGAIN should not 
be returned to threads invoking the service. Moreover, the return EAGAIN at 
Line 34 is actually the case of time out. Thus, we introduce a new return code 
ETIMEOUT in our specification. 

(3) Non-termination of k-mem_pool_alloc: we have discussed that the 
loop statement at Lines 23-33 in Fig. 1 does not terminate. However, it should 
terminate in certain cases, which are actually violated in the C code. When a 
thread requests a memory block in FOREVER mode and the requested size 
is larger than maz_sz, the maximum size of blocks, the loop at Lines 23-33 in 
Fig. 1 never finishes since pool_alloc always returns ENOMEM. The reason is that 
the “return ENOMEM” at Line 6 does not distinguish two cases, alloc_l < 0 
and free_l < 0. In the first case, the requested size is larger than maz_sz and 
the kernel service should return immediately. In the second case, there are no 
free blocks larger than the requested size and the service tries forever until 
some free block available. We repair the bug by splitting the if statement at 
Lines 4-7 into these two cases and introducing a new return code ESIZEERR 
in our specification. Then, we change the condition at Lines 25-26 to check that 
the returned value is ESIZEERR instead of ENOMEM. 


7 Conclusion and Future Work 


In this paper, we have developed a formal specification at low-level design of 
the concurrent buddy memory management of Zephyr RTOS. Using the rely- 
guarantee technique in the 7-Core framework, we have formally verified a set of 
critical properties for OS kernels such as invariant preservation, and preservation 
of memory configuration. Finally, we identified some critical bugs in the C code 
of Zephyr. 
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Our work explores the challenges and cost of certifying concurrent OSs for the 
highest-level assurance. The definition of properties and rely-guarantee relations 
is complex and the verification task becomes expensive. We used 40 times of 
LOS/LOP than the C code at low-level design. Next, we are planning to verify 
other modules of Zephyr, which may be easier due to simpler data structures 
and algorithms. For the purpose of fully formal verification of OSs at source code 
level, we will replace the imperative language in 7-Core by a more expressive 
one and add a verification condition generator (VCG) to reduce the cost of the 
verification. 
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Abstract. High-performance multithreaded software often relies on 
optimized implementations of common abstract data types (ADTs) like 
counters, key-value stores, and queues, i.e., concurrent objects. By using 
fine-grained and non-blocking mechanisms for efficient inter-thread syn- 
chronization, these implementations are vulnerable to violations of ADT- 
consistency which are difficult to detect: bugs can depend on specific 
combinations of method invocations and argument values, as well as 
rarely-occurring thread interleavings. Even given a bug-triggering inter- 
leaving, detection generally requires unintuitive test assertions to capture 
inconsistent combinations of invocation return values. 

In this work we describe the Violat tool for generating tests that 
witness violations to atomicity, or weaker consistency properties. Violat 
generates self-contained and efficient programs that test observational 
refinement, i.e., substitutability of a given ADT with a given implemen- 
tation. Our approach is both sound and complete in the limit: for every 
consistency violation there is a failed execution of some test program, 
and every failed test signals an actual consistency violation. In practice 
we compromise soundness for efficiency via random exploration of test 
programs, yielding probabilistic soundness instead. Violat’s tests reliably 
expose ADT-consistency violations using off-the-shelf approaches to con- 
current test validation, including stress testing and explicit-state model 
checking. 


1 Introduction 


Many mainstream software platforms including Java and .NET support mul- 
tithreading to enable parallelism and reactivity. Programming multithreaded 
code effectively is notoriously hard, and prone to data races on shared memory 
accesses, or deadlocks on the synchronization used to protect accesses. Rather 
than confronting these difficulties, programmers generally prefer to leverage 
libraries providing concurrent objects [19,29], i.e., optimized thread-safe imple- 
mentations of common abstract data types (ADTs) like counters, key-value 
stores, and queues. For instance, Java’s concurrent collections include implemen- 
tations which eschew the synchronization bottlenecks associated with lock-based 


© The Author(s) 2019 
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 534-546, 2019. 
https: //doi.org/10.1007/978-3-030-25543-5_30 


Violat: Generating Tests of Observational Refinement 535 


mutual exclusion, opting instead for non-blocking mechanisms [28] provided by 
hardware operations like atomic compare and exchange. 

Concurrent object implementations are themselves vulnerable to elusive bugs: 
even with effective techniques for exploring the space of thread interleavings, like 
stress testing or model checking [7,30,47], bugs often depend on specific combi- 
nations of method invocations and argument values. Furthermore, even recogniz- 
ing whether a given execution is correct is non-trivial, since recognition generally 
requires unintuitive test assertions to identify inconsistent combinations of return 
values. Technically, correctness amounts to observational refinement [18,21,32], 
which captures the substitutability of an ADT with an implementation [23]: any 
combination of values admitted by a given implementation is also admitted by 
the given ADT specification. 

In this work we describe an approach to generating tests of observational 
refinement for concurrent objects, as implemented by the Violat tool, which we 
use to discover violations to atomicity (and weaker consistency properties) in 
widely-used concurrent objects [9,10,12]. Unlike previous approaches based on 
linearizability [4,20,46], Violat generates self-contained test programs which do 
not require enumerating linearizations dynamically per execution, instead stati- 
cally precomputing the ADT-admitted return-value outcomes per test program, 
once, prior to testing. Despite this optimization, the approach is both sound 
and complete, i.e., in the limit: for every consistency violation there is a failed 
execution of some test program, and every failed test witnesses an actual consis- 
tency violation. In practice, we compromise soundness for efficiency via random 
exploration of test programs, achieving probabilistic soundness instead. 

Besides improving the efficiency of test execution, Violat’s self-contained 
tests can be validated by both stress testers and model checkers, and double 
as regression and conformance tests. Our previous works [9,10,12] demonstrate 
that Violat’s tests reliably expose ADT-consistency violations in Java implemen- 
tations using the Java Concurrency Stress testing tool [42]. In particular, Violat 
has uncovered atomicity violations in over 50 methods from Java’s concurrent 
collections; many of these violations seem to correspond with their documen- 
tations’ mention of weakly-consistent behavior, while others indicate confirmed 
implementation bugs, which we have reported. 

Previous work used Violat in empirical studies, without artifact evaluation 
[9,10,12]. This article is the first to consider Violat itself for evaluation, the first 
to describe its implementation and usage, and includes several novel extensions. 
For instance, in addition to stress testing, Violat now includes an integration with 
Java Pathfinder [47]; besides enabling complete systematic coverage of a given test 
program, this integration enables the output of the execution traces leading to con- 
sistency violations, thus facilitating diagnosis and repair. Furthermore, Violat is 
now capable of generating tests of any user-provided implementation, in addition 
to those distributed with Java. 
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2 Overview of Test Generation with Violat 


Violat generates self-contained programs to test the observational refinement of 
a given concurrent object implementation with respect to its abstract data type 
(ADT), according to Fig. 1. While its methodology is fairly platform agnos- 
tic, Violat currently integrates with the Java platform. Accordingly, its input 
includes the fully-qualified name of a single Java class, which is assumed to 
be available either on the system classpath, or in a user-provided Java archive 
(JAR); its output is a sequence of Java classes which can be tested with off- 
the-shelf back-end analysis engines, including the Java Concurrency Stress test- 
ing tool [42] and Java Pathfinder [47]. Our current implementation integrates 
directly with both back-ends, and thus reports test results directly, signaling 
any discovered consistency violations. 


Enumerate Test Calculate 
Schemas Schemas Outcomes 
Classpath Class Ee 
JAR Files Name 


Input to Viola 


Annotated Generate 
Schemas Code 

I; Spee Test Backend 
Generator Programs Tester 


Fig. 1. Violat generates tests by enumerating program schemas invoking a given con- 
current object, annotating those schemas with the expected outcomes of invocations 
according to ADT specifications, and translating annotated schemas to executable 
tests. 


Violat generates tests according to a three-step pipeline. The first step, 
described in Sect.3, enumerates test program schemas, i.e., concise descriptions 
of programs as parallel sequences of invocations of the given concurrent object’s 
methods. For example, Fig.2 lists several test schemas for Java’s Concurren- 
tHashMap. The second step, described in Sect. 4, annotates each schema with 
a set of expected outcomes, i.e., the combinations of return values among the 
given schema’s invocations which are admitted according to the given object’s 
ADT specification. The final step, described in Sect.5, translates each schema 
into a self-contained! Java class. 

Technically, to guide the enumeration of schemas and calculation of out- 
comes, Violat requires a specification of the given concurrent object, describing 
constructor and method signatures. While this could be generated automatically 
from the object’s bytecode, our current implementation asks the user to input 
this specification in JSON format. By additionally indicating whether meth- 
ods are read-only or weakly-consistent, the user can provide additional hints to 


1 The generated class imports only a given concurrent object, and a few basic 
java.util classes. 
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improve schema enumeration and outcome calculation. For instance, excessive 
generation of programs with only read-only methods is unlikely to uncover consis- 
tency violations, and weakly-consistent ADT methods generally allow additional 
outcomes — see Emmi and Enea [12]. Furthermore, Violat attempts to focus the 
blame for discovered violations by constructing tests with a small number of 
specified untrusted methods, e.g., just one. 


3 Test Enumeration 


To enumerate test programs effectively, Violat considers a simple representation 
of program schemas, as depicted in Fig. 2. We write schemas with a familiar nota- 
tion, as parallel compositions {...}||{...} of method-invocation sequences. 
Intuitively, schemas capture parallel threads invoking sequences of methods of 
a given concurrent object. Besides the parallelism, these schemas include only 
trivial control and data flow. For instance, we exclude conditional statements 
and loops, as well as passing return values as arguments, in favor of straight-line 
code with literal argument values. Nevertheless, this simple notion is expressive 
enough to capture any possible outcome, i.e., combination of invocation return 
values, of programs with arbitrarily complex control flow, data flow, and syn- 
chronization. To see this, consider any outcome y admitted by some execution of 
a program with arbitrarily-complex control and data flow in which methods are 
invoked with argument values æ, collectively. The schema in which each thread 
invokes the same methods of a thread of the original program with literal values 
x, collectively, is guaranteed to admit the same outcome y. 


java.util.ConcurrentHashMap 


Schema / Method Outcome 
{ put(@,0); put(1,1); put(1,1)} || { put(@,1); clear() } N,N,N,N, () 
{ put(@,®); remove(1) } || { put(1,0); contains(Q) } N,Q,N,F 
{ get(1); containsValue(1) } || { put(1,1); put(@,1); put(1,®) }1,F,N,N,1 
{ put(@,1); put(1,0) } || { elements() } N,N, C0] 
{ put(@,1); put(1,0) } || { entrySet() } N,N, [1=0] 
{ put(1,1) } |] { put(1,2); isEmpty() } N,1,T 

{ put(@,1); put(1,1) } || { keySet() } N,N, [1] 
{ keys()} || { put(0,1); put(1,1) } [11,N,N 
{ put(1,0); put(1,1); mappingCount()} || { remove(1) } N,N,2,0 
{ put(1,0); put(1,1); size()} || { remove(1) } N,N,2,0 
{ put(@,1); put(1,1) } || { toString() } N,N, 1=1 
{ put(@,1); put(1,0) } || { values() } N,N, [CO] 


Fig. 2. Program schemas generated by Violat for Java’s ConcurrentHashMap class, 
along with outcomes which are observed in testing, yet not predicated by Violat. 


For a given concurrent object, Violat enumerates schemas according to 
a few configurable parameters, including bounds on the number of threads, 
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invocations, and (primitive) values. By default, Violat generates schemas with 
exactly 2 threads, between 3 and 6 invocations, and exactly 2 values. While our 
initial implementation enumerated schemas systematically according to a well- 
defined order, empirically we found that this strategy spends too much time in 
neighborhoods of uninteresting schemas, i.e., which do not expose violations. Ulti- 
mately we adopted a pseudorandom enumeration which constructs each schema 
independently by randomly choosing the number of threads, invocations, and val- 
ues, within the given parameter bounds, and randomly populating threads with 
invocations. Methods are selected according to a weighted random choice, in which 
the weights of read-only and untrusted methods is 1; trusted mutator methods have 
weight 3. The read-only and trusted designations are provided by class specifica- 
tions — see Sect. 2. Integer argument values are chosen randomly between 0 and 1, 
according to the default value bound; generic-typed arguments are assumed to be 
integers. Collection and map values are constructed from randomly-chosen integer 
values, up to size 2. In principle, all of these bounds are configurable, but we have 
found these defaults to work reasonably well. 

Note that while the manifestation of a given concurrency bug can, in prin- 
ciple, rely on large bounds on threads, invocations, and values, recent studies 
demonstrate that the majority (96%) can be reproduced with just 2 threads [25]. 
Furthermore, while our current implementation adheres to the simple notion of 
schema in which all threads are execute in parallel, Violat can easily be extended 
to handle a more complex notion of schema in which threads are partially 
ordered, thus capturing arbitrary program synchronization. Nevertheless, this 
simple notion seems effective at exposing violations without requiring additional 
synchronization — see Emmi and Enea [12, Section 5.2]. 


4 Computing Expected Outcomes 


To capture violations to observational refinement, Violat computes the set of 
expected outcomes, i.e., those admitted by a given concurrent object’s abstract 
data type (ADT), for each program schema. Violat essentially follows the app- 
roach of Line-Up [4] by computing expected outcomes from sequential executions 
of the given implementation. While this approach assumes that the sequential 
behavior of a given implementation does adhere to its implicit ADT specification 
—and that the outcomes of concurrent executions are also outcomes of sequen- 
tial executions — there is typically no practical alternative, since behavioral ADT 
specifications are rarely provided. 

Violat computes the expected outcomes of a given schema once, by enumer- 
ating all possible shuffles of threads’ invocations, and recording the return values 
of each shuffle when executed by the given implementation. For instance, there 
are 10 ways to shuffle the threads of the schema 


{ get(1); containsValue(1) } || { put(1,1); put(0,1); put (1,0) } 
from Fig. 2, including the sequence 


get(1); put(1,1); put(0,1); put(1,0); containsValue(1). 
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Executing Java’s ConcurrentHashMap on this shuffle yields the values null, 
null, null, 1, and true, respectively. To construct the generated outcome, Violat 
reorders the return values according to the textual order of their correspond- 
ing invocations in the given schema; since containsValue is second in this order, 
after get, the generated outcome is null, true, null, null, 1. Among the 10 pos- 
sible shuffles of this schema, there are only four unique outcomes — shown later 
in Figs. 3 and 4. 


public class Test { // ontinued from the column to the 
public static class StringResult5 { 
@sun.misc.Contended public String r1; static String stringify(Object object) { ... } 


@sun.misc.Contended public String r2; 
Si public static void main(Stringl] args) { 
public String toString() { Thread thread] = new Thread(() -> { 


return Ci e Y PDS results.r1 = stringify(obj.get(1)); 

3 results.r2 = stringify(obj.containsValue(1)); 
} y; 
static StringResult5 results; Thread thread2 = new Thread(() -> { 
static HashSet<String> expected; results.r3 = stringify(obj.put(1, 1)); 
static ConcurrentHashMap obj; results.r4 = stringify(obj.put(@, 1)); 
static { results.r5 = stringify(obj.put(1, 2)); 

obj = new ConcurrentHashMap() ; )); 

results = new StringResult5(); 

expected = new HashSet<String>(); thread1.start(); thread2.start(); 

expected.add("@, true, null, null, 1"); thread1.join(); thread2.join(); 

expected.add("1, true, null, null, 1"); 


expected.add("null, true, null, null, 1") 


; assert expected.contains(results.toString()); 
expected.add("null, false, null, null, 1"); } 


Fig. 3. Code generated for the containsValue schema of Fig.2 for Java Pathfinder. 
Code generation for jcstress similar, but conforms to the tool’s idiomatic test format 
using decorators, and built-in thread and outcome management. 


Note that in contrast to existing approaches based on linearizability [20], 
including Line-Up [4], which enumerate linearizations per execution of a given 
program, Violat only enumerates linearizations once per schema. This is made 
possible for two reasons. First, by considering simple test programs in which all 
invocations are known statically, we know the precise set of invocations (includ- 
ing argument values) to linearize even before executing the program. Second, 
according to sequential happens-before consistency [12], we consider the record- 
ing of real-time ordering among invocations infeasible on modern platforms like 
Java and C++11, which provide only weak ordering guarantees according to a 
platform-defined happens-before relation. This enables the static prediction of 
ordering constraints among invocations. While this static enumeration is also 
exponential in the number of invocations, it becomes an additive rather than 
multiplicative factor, amounting to significant performance gains in testing. 
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ConcurrentHashMap: containsValue 
{ get(1); containsValue(1) } || { put(1,1); put(@,1); put(1,0) } 


outcome atomic? paths (JPF) frequency (jcstress) 
ð, true, null, null, 1 V 3 13,287 
1, false, null, null, 1 x 3 2 
1, true, null, null, 1 v 3 16,417 
null, false, null, null, 1 v 6 3,638,600 
null, true, null, null, 1 v 3 9,504 


Fig. 4. Observed outcomes for the size method, recorded by Java Pathfinder and 
jestress. Outcomes list return values in program-text order, e.g., get’s return value 
is listed first. 


5 Code Generation and Back-End Integrations 


Once schemas are annotated with expected outcomes, the translation to actual 
test programs is fairly straightforward. Note that until this point, Violat is 
mainly agnostic to the underlying platform for which tests are being generated. 
The only exception is in computing the expected outcomes for schema lineariza- 
tions, which executes the given concurrent object implementation as a stand-in 
oracle for its implicit ADT specification. 

Figure 3 lists a simplification of the code generated for the containsValue 
schema of Fig. 2. The test program initializes a concurrent-object instance and 
a hash table of expected outcomes, then runs the schema’s threads in paral- 
lel, recording the results of each invocation, and checks, after threads complete, 
whether the recorded outcome is expected. To avoid added inter-thread inter- 
ference and the masking of potential weak-memory effects, each recorded result 
is isolated to a distinct cache line via Java’s contended decorator. The actual 
generated code also includes exception handling, elided here for brevity. 

Our current implementation of Violat integrates with two analysis back-ends: 
the Java Concurrency Stress testing tool [42] (jcstress) and Java Pathfinder [47]. 
Figure 4 demonstrates the results of each tool on the code generated from 
the containsValue schema of Fig. 2. Each tool observes executions with the 4 
expected outcomes, as well as executions yielding an outcome that Violat does 
not predict, thus signaling a violation to observational refinement (and atom- 
icity). Java Pathfinder explores 18 program paths in a few seconds — achieving 
exhaustiveness via partial-order reduction [16] — while jcstress explores nearly 
4 million executions in 1s, observing the unpredicted outcome only twice. Aside 
from this example, Violat has uncovered consistency violations in over 50 meth- 
ods of Java’s concurrent collections [9, 10,12]. 
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6 Usage 


Violat is implemented as a Node.js command-line application, available from 
GitHub and npm.? Its basic functionality is provided by the command: 


$ violat-validator ConcurrentHashMap. json 


violation discovered 


{ put(0,1); size(); contains(1) } || { put(0,0); put(1,1) } 


outcome OK frequency 
0, 0O, true, null, null X T 

0, 1, true, null, null vV 703 

0, 2, true, null, null vV 94,636 
null, 1, false, 1, null v 2,263 
null, 1, true, 1, null v 59,917 
null, 2, true, 1, null V 4 


reporting violations among 100 generated programs. User-provided classes, indi- 
vidual schemas, program limits, and particular back-ends can also be specified: 
$ violat-validator MyConcurrentHashMap.json \ 
--jar MyCollections.jar \ 
--schema "{get(1); containsValue(1)} || {put(1,1); put(0,1); put(1,0)}" \ 
--max-programs 1000 \ 
--tester "Java Pathfinder" 


A full selection of parameters is available from the usage instructions: 


$ violat-validator --help 


7 Related Work 


Terragni and Pezza survey several works on test generation for concurrent 
objects [45]. Like Violat, Ballerina [31] and ConTeGe [33] enumerate tests 
randomly, while ConSuite [43], AutoConTest [44], and CovCon [6] exploit 
static analysis to compute potential shared-memory access conflicts to reduce 
redundancy among generated tests. Similarly, Omen [35-38], Narada [40], 
Intruder [39], and Minion [41] reduce redundancy by anticipating potential con- 
currency faults during sequential execution. Ballerina [31] and ConTeGe [33] 
compute linearizations, but only identify generic faults like data races, dead- 
locks, and exceptions, being neither sound nor complete for testing observational 
refinement: fault-free executions with un-admitted return-value combinations are 
false negatives, while faulting executions with admitted return-value combina- 
tions are generally false positives - many non-blocking concurrent objects exhibit 


? https: //github.com/michael-emmi/violat. 
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data races by design. We consider the key innovations of these works, i.e., redun- 
dancy elimination, orthogonal and complementary to ours. While Pradel and 
Gross do consider subclass substitutability [34], they only consider programs 
with two concurrent invocations, and require exhaustive enumeration of the 
superclass’s thread interleavings to calculate admitted outcomes. In contrast, 
Violat computes expected outcomes without interleaving method implementa- 
tions, i.e., considering them atomic. 

Others generate tests for memory consistency. TSOtool [17] generates ran- 
dom tests against the total-store order (TSO) model, while LCHECK [5] employs 
genetic algorithms. Mador-Haim et al. [26,27] generate litmus tests to distin- 
guish several memory models, including TSO, partial-store order (PSO), relaxed- 
memory order (RMO), and sequential consistency (SC). CppMem [2] considers 
the C++ memory model, while Herd [1] considers release-acquire (RA) and 
Power in addition to the aforementioned models. McVerSi [8] employs genetic 
algorithms to enhance test coverage, while Wickerson et al. [48] leverage the 
Alloy model finder [22]. In some sense, these works generate tests of observa- 
tional refinement for platforms implementing memory-system ADTs, i.e., with 
read and write operations, whereas Violat targets arbitrary ADTs, including 
collections with arbitrarily-rich sets of operations. 

Violat more closely follows work on linearizability checking. Herlihy and 
Wing [20] established the soundness of linearizability for observational refine- 
ment, and Filipovic et al. [14] established completeness. Wing and Gong [49] 
developed a linearizability-checking algorithm, which was later adopted by Line- 
Up [4] and optimized by Lowe [24]; while Violat pays the exponential cost of 
enumerating linearizations once per program, these approaches pay that cost per 
execution — an exponential quantity itself. Gibbons and Korach [15] established 
NP-hardness of per-execution linearizability checking for arbitrary objects, while 
Emmi and Enea [11] demonstrate tractability for collections. Bouajjani et al. [3] 
propose polynomial-time approximations, and Emmi et al. [13] demonstrate effi- 
cient symbolic algorithms. Finally, Emmi and Enea [9,10,12] apply Violat to 
checking atomicity and weak-consistency of Java concurrent objects. 
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